Professional Microsoft SQL Server 2014 Integration Services (Wrox Programmer to Programmer), 1st Edition (2014)

Chapter 5. Using Variables, Parameters, and Expressions

WHAT’S IN THIS CHAPTER?

· Reviewing variables, parameters, and expressions

· Using data types for variables and parameters

· Creating variables and parameters

· Expression syntax and usage

· Expression examples

WROX.COM DOWNLOADS FOR THIS CHAPTER

You can find the wrox.com code downloads for this chapter at www.wrox.com/go/prossis2014 on the Download Code tab.

If you have used SSIS packages for any involved ETL processes, you have inevitably encountered the need for dynamic capabilities. A dynamic package can reconfigure itself at runtime to do things like run certain steps conditionally, create a series of auto-generated filenames for export, or retrieve and set send-to addresses on an alert e-mail from a data table. Because dynamic changes are fairly common occurrences, developers and architects turn to expressions as they begin rolling out SSIS projects in their development shops.

This chapter attempts to provide a solid information base to get you up to speed on expressions. Here we will consolidate the common questions, answers, and best practices about expressions that we’ve heard and explained since the first release of SSIS. The good news is that expressions are easy to use and impressively powerful. As you read this chapter, you will not only gain an understanding about how expressions work but also gain some insight into how you can use variables and parameters to set up expressions on your current SSIS project.

DYNAMIC PACKAGE OBJECTS

SSIS includes multiple objects that can be used to create dynamic packages. Figure 5-1 provides a graphical description of this model in SSIS. SSIS can dynamically set a property in a task or component using an expression, which can be built using a set of building blocks, including variables, parameters, functions, literals, and more. When the package is run, the expression is evaluated and used in the property as each task or component is accessed.

FIGURE 5-1

Variable Overview

Variables are a key feature in the SSIS package development process. This object contains a value that can be hardcoded, dynamically set once, or modified multiple times throughout the execution of the package. Principally, variables provide a method for objects in a package to communicate with each other. Similar to their use in programming languages, variables are used in specific types of logic, such as iterating through a loop, concatenating multiple values together to create a file directory, or passing an array between objects.

Parameter Overview

Parameters were first introduced in SQL Server 2012. While similar to a variable in that a parameter can store information and be used in an expression, it has a few different properties and uses that you will want to understand. As demonstrated in Figure 5-2, parameters are set externally. The parameter can then be used in an expression to affect different properties in the package.

FIGURE 5-2

SSIS uses two types of parameters: project parameters and package parameters. Project parameters are created at the project level and can be used in all packages that are included in that project. On the other hand, package parameters are created at the package level and can be used only in that package. Project parameters are best used for values that are shared among packages, such as e-mail addresses for error messages. Package parameters are best used for values specific to that package, such as directory locations.

When using the project deployment model (discussed in depth in Chapter 22), parameters are the best choices to replace package configurations to create a dynamic and more flexible SSIS solution. Using the Required property of the parameter, you can also necessitate the caller of the package to pass in a value for the parameter. If you want to set the value of a property from outside the package, either required or not, parameters are the object to use. On the other hand, if you want to create or store values only within a package, variables are the object to use.

Expression Overview

Expressions are the key to understanding how to create dynamic packages in SSIS. One way to think about expressions is to compare them to the familiar model of a spreadsheet cell in a program like Microsoft Excel. A spreadsheet cell can hold a literal value, a reference to another cell on the spreadsheet, or functions ranging from simple to complex arrangements. In each instance, the result is a resolved value displayed in the cell. Figure 5-3 shows these same capabilities of the expression, which can hold literal values, identifiers available to the operation (references to variables or columns), or functions (built-in or user-defined). The difference in the SSIS world is that these values can be substituted directly into properties of the package model, providing powerful and dynamic workflow and operational functionalities.

FIGURE 5-3

Starting with SQL Server 2012, it is easy to see when an expression has been set on a property within an object. Expression adorners are special icons that are placed on top of the object icon if the object has an expression. This indicator makes it easier to understand why the package seems to be doing something behind the scenes that you weren’t expecting!

If you understand these visual analogies explaining how expressions, parameters, and variables fit into the SSIS picture, then you are almost ready to dig into the details of how to build an expression. First, however, it’s time to take a look at some of the details about data types, variables, and parameters that cause many of the issues for SSIS package developers.

UNDERSTANDING DATA TYPES

In SSIS, you must pay attention to data types, whether the data is coming from your Data Flow, is stored in variables, or is included in expressions. Failure to do so will cause a lot of frustration because the syntax checker will complain about incompatible data types when you are building expressions. If your Data Flow contains incompatible data types, your packages will raise either warnings or errors (if implicit conversions are made). This will happen even if the conversion is between Unicode and non-Unicode character sets. Comparison operations also are subject to either hard or soft errors during implicit conversion. Bad data type decisions can have a serious impact on performance. This seemingly simple topic causes significant grief for SSIS developers who haven’t yet learned the specifics of data types and how they are converted. The following sections provide a brief overview of how to resolve common data type conversion issues, beginning with a primer on SSIS data types.

SSIS Data Types

If you research the topic of “Integration Services Data Types” in Books Online, you’ll first notice that the data types are named much differently than similar types found in .NET or T-SQL. This nomenclature is troublesome for most new users. The following table provides a matrix between SSIS data types and a typical SQL Server set of data types. You’ll need this table to map between Data Flow columns and variable or parameters data types. The .NET managed types are important only if you are using Script component, CLR, or .NET-based coding to manipulate your Data Flows.

The following table is just for SQL Server. To do a similar analysis for your own data source, look at the mapping files that can be found in this directory: C:\Program Files\Microsoft SQL Server\120\DTS\MappingFiles\. If you’re familiar with OLE DB data types, you’ll understand these SSIS data type enumerations, because they are similar. However, there is more going on than just naming differences.

First, SSIS supports some data types that may not be familiar at all, nor are they applicable to SQL Server — namely, most of the unsigned integer types and a few of the date types. You’ll also notice the availability of the separate date-only (DT_DBDATE) and time-only (DT_DBTIME) types, which prior to SQL Server 2008 were available only for RDBMS databases like DB2 and ORACLE. With the introduction of similar data types in the SQL Server 2008 engine, they are also applicable in SSIS. Finally, notice the arrow “⇒” in the table, which indicates that these data types are converted to other SSIS data types in Data Flow operations that may be opportunities for performance enhancements.

SSIS DATA TYPE	SQL SERVER DATA TYPE	.NET MANAGED TYPE
DT_WSTR	nvarchar, nchar	System.String
DT_STR ⇒ DT_WSTR	varchar, char
DT_TEXT ⇒ DT_WSTR	text
DT_NTEXT ⇒ DT_WSTR	ntext, sql_variant, xml

DT_BYTES	binary, varbinary	Array of System.Byte
DT_IMAGE ⇒ DT_BYTES	timestamp, image

DT_DBTIMESTAMP	smalldatetime, datetime	System.DateTime
DT_DBTIMESTAMP2 ⇒ DT_DBTIMESTAMP	datetime
DT_DBDATE ⇒ DT_DBTIMESTAMP	date
DT_DATE ⇒ DT_DBTIMESTAMP
DT_FILETIME ⇒ DT_DBTIMESTAMP
DT_DBDATETIMESTAMPOFFSET	datetimeoffset

DT_DBTIME2	time	System.TimeSpan
DT_DBTIME ⇒ DT_DBTIME2

DT_NUMERIC	numeric	System.Decimal
DT_DECIMAL ⇒ DT_NUMERIC	decimal

DT_GUID	uniqueidentifier	System.Guid

DT_I1		System.SByte
DT_I2	smallint	System.Int16
DT_CY	smallmoney, money	System.Currency
DT_I4	int	System.Int32
DT_I8	bigint	System.Int64

DT_BOOL ⇒ DT_I4	bit	System.Boolean

DT_R4	real	System.Single
DT_R8	float	System.Double

DT_UI1	tinyint	System.Byte
DT_UI2	int	System.UInt16
DT_UI4	bigint	System.UInt32
DT_UI8	numeric	System.UInt64

Date and Time Type Support

SQL Server 2008 included new data types for separate date and time values and an additional time zone-based data type compliant with the ISO 8601 standard. SSIS has always had these data type enumerations for the other RDBMS sources, but as of SQL Server 2008, these can also be used for SQL Server as well, including DT_DBTIMESTAMP2 and DT_DBTIME2, added for more precision, and DT_DBTIMESTAMPOFFSET, added for the ISO DateTimeOffset SQL Server data type.

A common mistake made in SSIS packages is the improper selection of an SSIS date data type. For some reason, DT_DBDATE and DT_DATE are often used for date types in Data Flow components, but improper use of these types can result in overflow errors or the removal of the time element from the date values. SSIS data types provide a larger net for processing incoming values than you may have in your destination data source. It is your responsibility to manage the downcasting or conversion operations. Make sure you are familiar with the data type mappings in the mapping file for your data source and destination, and the specific conversion behavior of each type. A good start would be the date/time types, because there are many rules regarding their conversion, as evidenced by the large section about them in Books Online. You can find these conversion rules for date/time data types under the topic “Integration Services Data Types” found here: http://msdn.microsoft.com/en-us/library/ms141036(v=SQL.120).aspx.

How Wrong Data Types and Sizes Can Affect Performance

If you’ve been working with SSIS for a while, you know that it can use serious memory resources and sometimes be slower than you expect. That’s because the Data Flow components do most of their work in memory. This can be good because it eliminates the most time-consuming I/O operations. However, because SSIS uses memory buffers to accomplish this, the number of rows that can be loaded into a buffer is directly related to the width of the row. The narrower the row, the more rows that can be processed per buffer.

If you are defining the data types of a large input source, pick your data types carefully, so that you are not using the default 50 characters per column for a text file, or the suggested data types of the Connection Manager, when you do not need this extra safety cushion. Also, be aware that there are some trade-offs when selecting specific data types if they require any conversion as the data is being loaded into the buffers.

Data conversion is a fact of life, and you’ll have to pay for it somewhere in the ETL process. These general guidelines can give you a start:

· Convert only when necessary. Don’t convert columns from a data source that will be dropped from the data stream. Each conversion costs something.

· Convert to the closest type for your destination source using the mapping files. If a value is converted to a nonsupported data type, you’ll incur an additional conversion internal to SSIS to the mapped data type.

· Convert using the closest size and precision. Don’t import all columns as 50-character data columns if you are working with a fixed or reliable file format with columns that don’t require as much space.

· Evaluate the option to convert after the fact. Remember that SSIS is still an ETL tool and sometimes it is more efficient to stage the data and convert it using set-based methods.

The bottom line is that data type issues can be critical in high-volume scenarios, so plan with these guidelines in mind.

Unicode and Non-Unicode Conversion Issues

One aspect of ETL package development that you might not be used to is the default use of Unicode data types in SSIS packages. Not only is this the default import behavior, but all the string functions in SSIS expect Unicode strings as input. Unicode is a great choice if you’re unsure of the incoming data for handling data from import files with special characters, but if you’re not familiar with using this character set, it can be confusing at first. At the very least, using Unicode requires an additional step that is frequently missed, resulting in errors. For a typical demonstration, create a package that imports an Excel data source into a table defined with non-Unicode fields, or download the samples from www.wrox.com. Excel data is imported as Unicode by default, so the mapping step in the destination component complains that the data is not compatible, as shown in Figure 5-4.

FIGURE 5-4

NOTE You may experience some data being replaced by NULLs when importing Excel files using the Excel Connection Manager. This typically occurs when numeric and text data is stored within one column. One solution is to update the extended properties section of the connection string to look like this:

Extended Properties="EXCEL 14.0;HDR=YES;IMEX=1"

At first, you might assume that all you need to do is change the source data type to match the non-Unicode destination. Using the SQL conversion table as a guide, right click on the source, select the Show Advanced Editor option, and change the column type toDT_STR to match the destination SQL Server varchar data type. Now you’ll find that the same error from Figure 5-4 is occurring on both the source and the destination components. As discussed earlier in this section, SSIS requires purposeful conversion and casting operations. To complete the task, you need to add only a Data Conversion Transformation to convert the DT_WSTR and DT_R8 data types to DT_STR and DT_CY, respectively. The Data Conversion Transformation should look similar to Figure 5-5.

FIGURE 5-5

Notice in this Data Conversion Transformation that the data types and lengths are changed to truncate and convert the incoming string to match the destination source. Also, notice the Code Page setting that auto-defaults to 1252 for ANSI Latin 1. The Code Page setting varies according to the source of the Unicode data you are working with. If you are working with international data sources, you may need to change this to interpret incoming Unicode data correctly.

This type casting operation is a good, simple example of how SSIS packages handle data of differing types. However, within expressions it is not necessary to bring in the conversion component to cast between different types. You can simply use casting operators to change the data types within the expression.

Casting in SSIS Expressions

If you want to experience the developer’s equivalent of poking your eye out, forget to put a casting operator in your Data Flow expressions. SSIS is tightly tied to data types and requires casting, which simply defines the data type for a value or expression. If you forget to use casting or choose the wrong data type, the package may fail or cause errors when trying to insert that column into the final destination.

While you can run into some frustrating issues if you don’t do it, the need for casting is not always intuitive. For example, the result of any string function defaults to the Unicode string type. If you are attempting to store that value in a non-Unicode column, you need to cast. Conversely, if you are storing the value in a variable, you don’t need to cast. (That’s because the data types in variable definitions allow only Unicode; more about that later in the section “Defining Variables.”)

The good news is that casting is easy. In the expression language, this looks just like a .NET primitive cast. The new data type is provided in parentheses right next to the value to be converted. A simple example is casting a 2-byte signed integer to a 4-byte signed integer:

(DT_I4)32

Of course, not all the casting operators are this simple. Some require additional parameters when specific precision, lengths, or code pages have to be considered to perform the operation. These operators are listed in the following table:

CASTING OPERATOR	ADDITIONAL PARAMETERS
DT_STR(<<length>>, <<code_page>>)	length — Final string length code_page — Unicode character set
DT_WSTR(<<length>>)	length — Final string length
DT_NUMERIC(<<precision>>, <<scale>>)	precision — Max number of digits scale — Number of digits after decimal
DT_DECIMAL(<<scale>>)	scale — Number of digits after decimal
DT_BYTES(<<length>>)	length — Number of final bytes
DT_TEXT(<<code_page>>)	code_page — Unicode character set

Casting causes the most obvious trouble during comparison operations and logical expressions. Remember that all operands in comparison operations must evaluate to a compatible data type. The same rule applies to complex or compound logical expressions. In this case, the entire expression must return a consistent data type, which may require casting sections of the expression that may not readily appear to need casting. This is similar to the situation that you have in T-SQL programming when you attempt to use a number in a where clause for a numeric column, or when using a case statement that needs to return columns of different types. In the where predicate, both the condition and the column must be convertible into a compatible type. For the case statement, each column must be cast to the same variant data type. Later in the chapter, you’ll look at examples in which you need to pay attention to casting when using comparison and logical expressions.

A less obvious issue with casting can occur when data becomes truncated during casting. For example, casting Unicode double-byte data to non-Unicode data can result in lost characters. Significant digits can be lost in forced casting from unsigned to signed types or within types like 32-bit integers to 16-bit integers. These errors underscore the importance of wiring up the error outputs in the Data Flow components that have them. Before you look at that, however, look at the following section about variables and parameters and how they are used in dynamic SSIS package development.

USING VARIABLES AND PARAMETERS

Variables and parameters are the glue holding dynamic package development together. As discussed earlier, both of these objects are used to move values between package components. They are no different from variables in any programming environment. Variables in SSIS packages are scoped, or can be accessed, either within the package level or to a specific package component. Parameters can be scoped to either the package or project level.

Defining Variables

Variables can be created, deleted, renamed, and have their data types changed, as long as the package is in design mode. Once the package is validated and in runtime mode, the variable definition is locked; only the value of the variable can change. This is by design, so that the package is more declarative and type-safe. Creating a new variable is done through a designer that defines the scope of the variable depending upon how it is accessed. As mentioned earlier, variables can be scoped either to the package or to a specific component in the package. If the variable is scoped at a component level, only the component or its subcomponents have access to the variable. The following important tips can keep you out of trouble when dealing with variables:

· Variables are case sensitive. When you refer to a variable in a script task or an expression, pay attention to the case of the name. Different shops have their own rules, but typically variables are named using camel-case style.

· Variables can hide other variable values higher in the hierarchy. It is a good practice to not name variables similarly. This is a standard readability programming issue. If you have one variable outside a task and one inside the task, name them using identifiers like “inner” or “outer” to differentiate them.

A variable can be created by right-clicking the design surface of the package in which you need it. The Variables dialog enables you to create, edit, and delete variables. Figure 5-6 shows an example of two variables created within two scope levels: the Data Flow Task and the package.

FIGURE 5-6

However, the Variables window does not expose all the capabilities of the variables. By selecting a variable and pressing F4, you will see the Properties window for the SelectSQL variable, as shown in Figure 5-7.

FIGURE 5-7

The reason for displaying the Properties window for the SelectSQL variable is to point out the EvaluateAsExpression and Expression properties. The value of a variable either can be a literal value or can be defined dynamically. By setting the EvaluateAsExpression property to True, the variable takes on a dynamic quality that is defined by the expression provided in the Expression property. The SelectSQL variable is actually holding the result of a formula that concatenates the string value of the base select statement stored in theBaseSelect variable and a user-provided date parameter. The point often missed by beginning SSIS developers is that these variables can be used to store expressions that can be reused throughout the package. Rather than recreate the expression all over the package, you can create an expression in a variable and then plug it in where needed. This greatly improves package maintainability by centralizing the expression definition. You’ll see an example that shows how to create an expression-based variable later in this chapter.

As an alternative to using expressions, variables can also be set programmatically using the script tasks. Refer to Chapter 9 for examples describing how this is accomplished.

Defining Parameters

An exciting addition to the Integration Services family is the concept of parameters. Like variables, parameters store information, but they can be set in a variety of ways: package default, project default, or execution values. Using any method, the value stored in the parameter can be used in any expression to set a package property or variable. The parameter is initially created in the package or project.

To create a package parameter, select the Parameters tab in the design window and click the first icon on the toolbar. You can enter all the information shown in Figure 5-8. If you set the Required property, you force the setting of the parameter value to occur at runtime. If you set the Sensitive property, you tell Integration Services to encrypt the value in the catalog. To create a project parameter, right-click on the Project.params node in the Solution Explorer and select the Open option. This will open a similar view to the package parameters view.

FIGURE 5-8

Once you have created a package or project parameter, you can use it to set other values. Parameter names are case sensitive and are prefixed by a dollar sign and either “Project” or “Package” depending on its type. Keep in mind that unlike variables, parameters cannot be changed by an expression. You’ll walk through a package that uses a parameter to create expressions later in this chapter.

Variable and Parameter Data Types

You may have noticed that the data types available for variable definition are a little different from the SSIS variables that were discussed earlier in this chapter. For example, the value type for string variable storage is String instead of DT_WSTR or DT_STR. Admittedly, this is confusing. Why does SSIS use what looks like a generalized managed type in the variable definition and yet a more specific set of data types in the Data Flows? The answer lies in the implementation of variables within the SSIS engine. Variables can be set from outside of the package, so they are implemented in SSIS as COM variants. This enables the SSIS engine to use some late binding to resolve to the variable value within the package. However, note that this variant data type is not available anywhere within your control as an SSIS programmer. Variants are only an internal implementation in SSIS. Use the following table to help map the variable data types to SSIS Data Flow data types:

VARIABLE DATA TYPE	SSIS DATA TYPE	DESCRIPTION
Boolean	DT_BOOL	Boolean value. Either True or False. Be careful setting these data types in code because the expression language and .NET languages define these differently.
Byte	DT_UI1	A 1-byte unsigned int. (Note this is not a byte array.)
Char	DT_UI2	A single character
DateTime	DT_DBTIMESTAMP	A date-time structure that accommodates year, month, hour, minute, second, and fractional seconds
DBNull	N/A	A declarative NULL value
Double	DT_R8	A double-precision, floating-point value
Int16	DT_I2	A 2-byte signed integer
Int32	DT_I4	A 4-byte signed integer
Int64	DT_I8	An 8-byte signed integer
Object	N/A	An object reference. Typically used to store data sets or large object structures
SByte	DT_I1	A 1-byte, signed integer
Single	DT_R4	A single-precision, floating-point value
String	DT_WSTR	Unicode string value
UInt32	DT_UI4	A 4-byte unsigned integer
UInt64	DT_UI8	An 8-byte unsigned integer

For most of the data types, there is ample representation. Typically, the only significant issues with variable data types are related to the date/time and string data types. The only options are the higher capacity data types. This is not a big deal from a storage perspective, because variable declaration is rather finite. You won’t have too many variables defined in a package. If a package requires a string data type, note in the preceding table that the default data type for strings is the Unicode version; if you put values into a variable of the string data type, you need to convert for non-Unicode values.

This seems like a lot of preliminary information to go over before diving into creating an expression, but with a basic understanding of these core concepts, you will avoid most of the typical issues that SSIS developers encounter. Now you can use this knowledge to dive into the expression language and some sample uses of expressions in SSIS.

WORKING WITH EXPRESSIONS

The language used to build expressions can be a bit disorienting at first. If you started out as a programmer, you will be comfortable switching between Visual Basic, C#, and T-SQL. The key to being proficient in building expressions is understanding that the syntax of this new scripting language is a combination of all these different languages.

C#-Like? Close, but Not Completely

Why not write the expression language in T-SQL or a .NET-compliant language? The answer is mostly related to marketing: expressions should reflect the multiplatform capability to operating on more than just SQL Server databases. Remember that expressions can be used on data from other RDBMS sources, like Oracle, DB2, and even data from XML files. However, the technical explanation is that the SSIS and SQL Server core engines are written in native code, so any extension of the expression language to use .NET functions would incur the performance impact of loading the CLR and the memory management systems. The expression language without .NET integration can be optimized for the custom memory management required for pumping large row sets through Data Flow operations. As the SSIS product matures, you’ll see the SSIS team add more expression enhancements to expand on the existing functions. Meanwhile, let’s look at some of the pitfalls of using the expression language.

The expression language is marketed as having a heavily C#-like syntax, and for the most part that is true. However, you can’t just put on your C# hat and start working, because some peculiarities are mixed into the scripting language. The language is heavily C#-like when it comes to using logical and comparison operators, but it leans toward a Visual Basic flavor and sometimes a little T-SQL for functions. For example, notice that the following common operators are undeniably from a C# lineage:

EXPRESSION OPERATOR	DESCRIPTION
\|\|	Logical OR operation
&&	Logical AND operation
==	Comparison of two expressions to determine equivalency
!=	Comparison of two expressions to determine inequality
? :	Conditional operator

The conditional operator may be new to you, but it is especially important for creating compound expressions. In earlier releases of SSIS, the availability of this operator wasn’t readily intuitive. If you aren’t used to this C-style ternary operator, it is equivalent to similar IF..THEN..ELSE.. or IIF(<Condition>, <True Action>, <False Action>) constructs.

The following functions look more like Visual Basic script or T-SQL language functions than C#:

EXPRESSION FUNCTION	DESCRIPTION	C# EQUIVALENT
POWER()	Raise numeric to a power	Pow()
LOWER()	Convert to lowercase	ToLower()
GETDATE()	Return current date	Now()

This makes things interesting because you can’t just plug in a C# function without ensuring that there isn’t an SSIS expression function to perform the same operation that is named differently. However, if you make this type of mistake, don’t worry. Either the expression turns red, or you’ll immediately get a descriptive error instructing you that the function is not recognized upon attempting to save. A quick look in Books Online can help resolve these types of function syntax differences.

In some instances, the function you are looking for can be drastically different and cause some frustration. For example, if you are used to coding in C#, it may not be intuitive to look for the GETDATE() function to return the current date. The GETDATE() function is typically something one would expect from a T-SQL language construct. Thankfully, it performs like a T-SQL function should to return the current date. This is not always the case. Some functions look like T-SQL functions but behave differently:

EXPRESSION FUNCTION	DESCRIPTION	DIFFERENCE
DATEPART()	Parses date part from a date	Requires quotes around the date part
ISNULL()	Tests an expression for NULL	Doesn’t allow a default value

This departure from the T-SQL standard can leave you scratching your head when the expression doesn’t compile. The biggest complaint about this function is that you have to use composite DATEPART() functions to get to any date part other than month, day, or year. This is a common task for naming files for archiving. Nor does the ISNULL() function work like the T-SQL function. It returns either true or false to test a value for existence of NULL. You can’t substitute a default value as you would in T-SQL.

These slight variations in the expression language between full-scale implementations of T-SQL, C#, or Visual Basic syntaxes do cause some initial confusion and frustration, but these differences are minor in the grand scheme of things. Later in this chapter, you’ll find a list of expressions that you can cut and paste to emulate many of the functions that are not immediately available in the expression language.

The Expression Builder

Several locations in the SSIS development environment allow the creation of an expression. Whether you are in the Variables window or within any property expression editor, ultimately the expression is created within a user interface called the Expression Builder. This user interface maintains easy references to both system- and user-based variables and provides access to expression functions and operators. The most important feature of the Expression Builder is the capability it provides to test an expression — that is, to see the evaluated value — by clicking the Evaluate Expression button. This is especially helpful as you learn the syntax of the expression language. By dragging and dropping variables and operators onto the expression workspace, you can see how to format expressions properly. Inside Data Flow components, typically a specific expression builder includes additional elements related to the Data Flow. In Figure 5-9, you can see that the user interface for the Derived Column Transformation includes a folder named Columns to allow expressions to be built with data from the Data Flow.

FIGURE 5-9

The only downside in the Data Flow component versions of the Expression Builder is that you don’t have the option to see the results of evaluating the expression to determine whether you coded the expression properly. The reason is because you can’t see the data from the Data Flow, because this information is not available without running the package.

This brings up a point about maintainability. If you have an involved expression that can be realized independently from data from the data stream, you should build the expression outside of the Data Flow component and simply plug it in as a variable. However, in some cases you have no choice but to build the expression at the Data Flow component level. If so, one of the best practices that we recommend is to create one variable at the package level called MyExpressionTest. This variable gives you a quick jumping off point to build and test expressions to ensure that the syntax is coded correctly. Simply access the Variables property window and click the ellipsis beside the expression property, and the Expression Builder pops up. Use this technique to experiment with some of the basic syntax of the expression language in the next section.

Syntax Basics

Building an expression in SSIS requires an understanding of the syntax details of the expression language. Each of the following sections dives into an aspect of the expression syntax and explores the typical issues encountered with it, including their resolution.

Equivalence Operator

This binary operator, which is used to compare two values, seems to create some problems for SSIS developers who are not used to using the double equal sign syntax (==). Forgetting to use the double equal sign in a comparison operation can produce head-scratching results. For example, consider a precedence operation that tests a variable value to determine whether the value is equal to True, but the expression is written with a single equal sign. Imagine that the variable is set by a previous script task that checks whether a file is available to process.

@[User::MyBooleanValue] = True

The expression is evaluated, and @MyBooleanValue is assigned the value of True. This overwrites any previous value for the variable. The precedence constraint succeeds, the value is true, and the tasks continue to run with a green light. If you aren’t used to using the double equal sign syntax, this will come back to bite you, which is why we have discussed this operator by itself at the front of the syntax section.

String Concatenation

There are many uses for building strings within an expression. Strings are built to represent a SQL statement that can be executed against a database, to provide information in the body of an e-mail message, or to build file paths for file processing. Building strings is a core task that you have to be able to do for any development effort. In SSIS the concatenation operator is the plus (+) sign. Here is an example that you can quickly put together in the Expression Builder and test:

"The Server [" + LOWER( @[System::MachineName]) + "] is running this package"

This returns the following string:

The Server [myserver] is running this package

If you need to build a string for a file path, use the concatenation operator to build the fully qualified path with the addition of an escape character to add the backslashes. Later in this chapter, the section “String Literals” covers all the common escape characters that you’ll need for string building. A file path expression would look like this:

"c:\\mysourcefiles\\" + @myFolder + "\\" + @myFile

Note that strings are built using double quotes (""), not single quotes ('') as you might see in T-SQL; it’s important to ensure that the strings are all Unicode or all non-Unicode. A previous limitation of 4,000 characters for an expression has been removed from Integration Services. Feel free to make strings as long as you desire!

Line Continuation

There are two reasons to use line continuation characters in SSIS expressions. One is to make the expression easier to troubleshoot later, and the other is to format output for e-mail or diagnostic use. Unfortunately, the expression language does not support the use of comments, but you can use the hard returns to help the expression look more organized. In the Expression Builder, simply press the Enter key to have your expression displayed with the carriage-return-line-feed character sequence. This formatting is maintained even after you save the expression. To format output of the expression language, use the C-like escape character \n. Here’s an example of using it with a simple expression:

"My Line breaks here\nAnd then here\n; )"

This returns the following string:

My Line breaks here

And then here

; )

Note that it is not necessary to show the expression in code form in one line. An expression can be written on multiple lines to clarify viewing of it at design time. The output would remain the same.

Literals

Literals are hard coded information that you must provide when building expressions. SSIS expressions have three types of literals: numeric, string, and Boolean.

Numeric Literals

A numeric literal is simply a fixed number. Typically, a number is assigned to a variable or used in an expression. Numeric literals in SSIS have the same behavior that they have in C# or Java — you can’t just implicitly define numeric literals. Well, that’s not completely true; SSIS does interpret numeric values with a few default rules, but the point is that the rules are probably not what you might expect. A value of 12 would be interpreted as the default data type of DT_UI4, or the 4-byte unsigned integer. This might be what you want, but if the value were changed to 3000000000 during the evaluation process, an error similar to this will be generated:

The literal "3000000000" is too large to fit into type DT_UI4. The magnitude of the

literal overflows the type.

SSIS operates on literals using logic similar to the underlying .NET CLR. Numeric literals are checked to see if they contain a decimal point. If they do not, the literal is cast using the unsigned integer DT_UI4 data type. If there is a decimal point, the literal is cast as aDT_NUMERIC. To override these rules, you must append a suffix to the numeric literal. The suffix enables a declarative way to define the literal. The following are examples of suffixes that can be used on numeric literals:

SUFFIX	DESCRIPTION	EXAMPLE
L or l	Indicates that the numeric literal should be interpreted as the long version of either the DT_I8 or DT_R8 value types depending upon whether a decimal is present	3000000000L ⇒ DT_I8 3.14159265L ⇒ DT_R8
U or u	Indicates that the numeric literal should represent the unsigned data type	3000000000UL ⇒ DT_UI8
F or f	Indicates that the numeric literal represents a float value	100.55f ⇒ DT_R4
E or e	Indicates that the numeric literal represents scientific notation Note: Expects at least one digit scientific notation followed by float or long suffix	6.626 × 10 -34 J/s ⇒ 6.626E-34F ⇒ DT_R8 6.626E won’t work. If you don’t have a digit, then format as follows: 6.626E+0L or 6.626E+0f

Knowing these suffixes and rules, the previous example can be altered to 3000000000L, and the expression can be validated.

String Literals

When building strings, there are times when you need to supply special characters in them. For example, PostgreSQL database sources require the use of quoted column and table names. The key here is to understand the escape sequences that are understood by the expression syntax parser. The escape sequence for the double quote symbol is \". A sample expression-generated SQL statement might look like this:

"Select \"myData\" from \"owner\".\"myTable\""

The preceding expression would generate the following string:

Select "myData" from "owner"."myTable"

Other common literals that you may need are listed in this table:

SUFFIX	DESCRIPTION	EXAMPLE
\n	New Line or Carriage Feed Line Return	"Print this on one line\nThis on another" Print this on one line This on another
\t	Tab character	"Print\twith\ttab\tseparation" Print with tab separation
\"	Double-quotation mark character	"\"Hey! "\" "Hey! "
\	Backslash	"c:\myfile.txt" c:\myfile.txt

A few other string escape sequences are supported, but the elements in this table list those most frequently used. The backslash escape sequences come in handy when building file and path strings. The double quote escape sequence is more often used to interact with data sources that require quoted identifiers. This escape sequence is also used in combination with the remaining new line and tab characters to format strings for logging or other reporting purposes.

Boolean Literals

The Boolean literals of True and False don’t have to be capitalized, nor are they case sensitive. Boolean expressions are shortcut versions of the logical operators. To drive certain package functionality conditionally based on whether the package is running in an offline mode, you could write an expression in a variable using an artificial on or off type switch mechanism, as shown here:

@[System::OfflineMode]==True ? 1 : 0 (Not Recommended)

The idea is to use the results of this operation to determine whether a precedence constraint should operate. The precedence operator would retest the expression to determine whether the value was 1 or 0. However, this is an awfully long way to do something. It’s much easier to just create an expression that looks like this:

@[System::OfflineMode]==False

Then all you have to do is plug the expression into the Precedence Editor, as shown in Figure 5-10.

FIGURE 5-10

Note that using the literal is recommended over using any numeric values for evaluation. Programming any expression to evaluate numeric versions of Boolean values is dangerous and should not be a part of your SSIS techniques.

Referencing Variables

Referencing variables is easy using the Expression Builder. Drag and drop variables onto the Expression Builder to format the variable into the expression properly. As shown in the following example, notice that the format of the variable automatically dropped into the expression is preceded with an @ symbol, followed by the namespace, a C++-like scope resolution operator, and then the variable name:

@[namespace::variablename]

Technically, if the variable is not repeated in multiple namespaces and there are no special characters (including spaces) in the variable name, you could get away with referring to the variable using a short identifier like @variablename or just the variable name. However, this type of lazy variable referencing can get you into trouble later. We recommend that you stick with the fully qualified way of referencing variables in all SSIS expressions.

Referencing Parameters

Referencing parameters is just as simple in the Expression Builder. When you drag and drop the parameter name, the value is automatically preceded with an @ symbol, followed by square brackets containing a dollar sign, the namespace of the package or project, the C++-like scope resolution operator, and the parameter name:

@[$namespace::parametername]

Typically, developers can run into trouble with variable and parameter references in the Precedence Constraint Editor (refer to Figure 5-10). That’s probably because there is no Expression Builder to help build the expression, so it must be manually entered. This is where the tip of creating the dummy variable MyExpressionTester comes in handy. You can create an expression within this dummy variable Expression Builder and then simply cut and paste the value into the Precedence Constraint Editor.

Referencing Columns

Columns can be referenced in expressions, but only within components in a Data Flow task. This makes sense. Creating a global expression to reference a value in a Data Flow is the equivalent of trying to use a single variable to capture the value of a set-based processing operation. Even a variable expression defined at the same level or scope of a Data Flow task should not be able to reference a single column in the Data Flow under constant change. However, from within specific components like the Derived Column Transformation, the Expression Builder can reference a column because operations occur at the row level. Expressions within a data component can access column identifiers to allow point-and-click building of expressions. There are a couple things to remember when referencing columns in expressions:

· Data Flow column names must follow the SSIS standards for special characters.

· Column names must be uniquely named or qualified within a Data Flow.

A common issue with building expressions referencing columns in a Data Flow has less to do with the expression language than the names of the columns themselves. This is particularly true when dealing with Microsoft Excel or Access data, where columns can use nonstandard naming conventions. SSIS requires that the columns being used in an expression begin with either a valid Unicode letter or an underscore (_). With the exception of bracket characters, any other special characters require qualification of the column in order to be used within an expression.

Brackets ([ and ]) are the designators used by SSIS to qualify a column name. Qualification of column names is required if the name contains special characters — including spaces. Because bracket characters are column name qualifiers, any column with brackets in the name must be renamed to use an expression. This doesn’t require changing the column name in the originating source. Column names also must be qualified when two or more columns in a Data Flow have the same name, in order to avoid ambiguous references. The following are examples of columns that need qualification:

COLUMN NAME	QUALIFIED COLUMN NAME	DESCRIPTION
My Column	[My Column]	Column names can’t contain spaces.
File#	[File#]	Column names can’t contain special characters.
@DomainName	[@DomainName]
Enrolled?	[Enrolled?]
Source1 ID	[Source1].[ID]	Column names can’t have the same name within a Data Flow.
Source2 ID	[Source2].[ID]

Another way to refer to columns, unique to SSIS, is by lineage number. A lineage number is something that SSIS assigns to each input and output as it is added to a transformation component in a package. The lineage number is quickly replaced by the real column name when the expression is syntax compiled. To find the lineage number for a column, look at any advanced editor dialog and find the column in the input column properties under LineageID. Keep in mind that as you add columns, the lineage numbers may change, so they should not be used for manipulation purposes, only for troubleshooting.

Boolean Expressions

Boolean expressions evaluate to either true or false. In their simplest implementation, precedence constraints use Booleans expressions as gatekeepers to determine whether or not an operation should occur. Within Data Flow operations, Boolean expressions are typically employed in the Conditional Split Transformation to determine whether a row in a Data Flow should be directed to another output.

For example, a Boolean expression to determine whether a Control Flow step would run only on Friday would require code to parse the day of the week from the current date and compare it to the sixth day, as shown here:

DATEPART( "dw", GETDATE() ) == 6

This is a useful Boolean expression for end of the week activities. To control tasks that run on the first day of the month, use an expression like this:

DATEPART ("dd", GETDATE() ) == 1

This expression validates as true only when the first day of the month occurs. Boolean expressions don’t have to be singular. Compound expressions can be built to test a variety of conditions. Here is an example in which three conditions must all evaluate to true in order for the expression to return a true value:

BatchAmount == DepositAmount && @Not_Previously_Deposited == True &&

BatchAmount > 0.00

The @Not_Previously_Deposited argument in this expression is a variable; the other arguments represent columns in a Data Flow. Of course, an expression can just as easily evaluate alternate conditions, like this:

(BatchAmount > 0.00 || BatchAmount < 0.00) && @Not_Previously_Deposited == True

In this case, the BatchAmount must not be equal to 0.00. An alternative way to express the same thing is to use the inequality operator:

BatchAmount != 0.00 && @Not_Previously_Deposited == True

Don’t be tripped up by these simple examples. They were defined for packages in which the data had known column data types, so there was no need to take extra precautions with casting conversions. If you are dealing with data from less reliable data sources, however, or you know that two columns have different data types, then take casting precautions with your expression formulas, such as in this expression:

(DT_CY)BatchAmount == (DT_CY)DepositAmount && @Not_Previously_Deposited ==

True && (DT_CY)BatchAmount > (DT_CY)0.00

The Boolean expression examples here are generally the style of expression that are used to enable dynamic SSIS package operations. We have not covered the conditional, date/time, and string-based Boolean expressions, which are in the following sections. String expression development requires a little more information about how to handle a NULL or missing value, which is covered next. You can see some examples of these Boolean expressions put to work at the end of this chapter.

Dealing with NULLs

In SSIS, variables can’t be set to NULL. Instead, each variable data type maintains a default value in the absence of a value. For strings, the default value is an empty string, rather than the default of NULL that you might be used to in database development. However, Data Flow components can most certainly contain NULL values. This creates problems when variables are intermixed within Data Flow components. This mixture occurs either within a Script Task or within an expression.

However, if a value in the Data Flow needs to be set to NULL or even tested for a NULL value, this is another matter altogether and can be accomplished rather easily with the ISNULL() expression function and the NULL (type) casting functions. Just understand that variables are going to behave a little differently.

NULLs and Variables

The reason you can’t set variables to NULL values is related to the COM object variant implementation of variables in the SSIS engine. Regardless of the technical issue, if you are testing a variable for the absence of a value, you have to decide ahead of time what value you are going to use to represent the equivalent of a NULL value, so that you can test for it accurately. For example, the DateTime variable data type defaults to 12/30/1899 12:00:00 a.m. if you purposely set it to NULL. You can test this out yourself by creating a DateTimevariable and setting it equal to an expression defined using the casting function NULL(DT_DBTIMESTAMP).

It helps to get a handle on the default values for the SSIS variable data types. You can find them in this table:

VARIABLE DATA TYPE	DEFAULT VALUE
Boolean	False
Byte	0
Char	0
DateTime	12/30/1899
DBNull	(Can’t test in an expression)
Double	0
Int16	0
Int32	0
Int64	0
Object	(Can’t test in an expression)
SByte	0
Single	0
String	"" (empty string)
UInt32	0
UInt64	0

Using this table of default values, the following expression could be used in a precedence operation after testing for the absence of a value in a string variable MyNullStringVar:

@[User::MyNullStringVar]==""

If the value of the user variable is an empty string, the expression evaluates to a True value and the step is executed.

A frequent logic error that SSIS developers make is to use a variable to set a value from an expression that will be used within a multiple instance looping structure. If the value is not reset in a way that enables clean retesting, the value of the variable will remain the same for the life of the package. No error will be raised, but the package may not perform multiple iterations as expected. Make sure a variable is reset to enable retesting if the test will be performed multiple times. This may require additional variables to cache intermediate results.

NULLs in Data Flow

Using the NULL function in Data Flow Transformations is a different matter because values in a Data Flow can actually be NULL. Here you can use the expression function to test for NULL values in the data stream. Trouble usually stems from a misunderstanding of either how the ISNULL() function works or what to do after a NULL value is found. First, the ISNULL() expression function tests the expression in the parentheses for the value of NULL. It does not make a substitution if a NULL value is found, as the same-named function does in T-SQL. To emulate the T-SQL function ISNULL(), build an SSIS expression in a Data Flow, as shown here:

IsNull(DATA_COLUMN) ? YOUR_DEFAULT_VALUE : DATA_COLUMN

If instead you want to set a column to NULL based on some attribute of the data in the incoming data stream, the logical structure is similar. First, provide the testing expression followed by the actions to take if the test is true or false. Here is a function that sets a data column in a Data Flow to NULL if the first character starts with “A”:

SUBSTRING([MyColumn], 1, 1)=="A" ? NULL(DT_WSTR, 255) : [MyColumn]

A typical issue that occurs when handling NULLs doesn’t actually have anything to do with NULL values themselves but rather with string expressions. When creating data streams to punch back into RDBMS data destinations, you will often want to send back a column with NULL values when a test on the data can’t be completed. The logic is to either send the column data back or replace the column data with a NULL value. For most data types, this works by sending the results of the NULL function for the data type desired. For some reason, this works differently when you want to save non-Unicode data with a NULL value. You’d expect the following expression to work, but it doesn’t:

SUBSTRING([MyColumn] , 1, 1)=="A" ?

NULL(DT_STR, 255, 1252) : [MyColumn] (This doesn't work in SSIS)

The preceding example won’t work because of how SSIS handles NULL values for the non-Unicode string type as parameters. The only way to fix this is to cast the NULL function as follows:

SUBSTRING([MyColumn] , 1, 1)=="A" ?

(DT_STR, 255, 1252)NULL(DT_STR, 255, 1252) : [MyColumn]

This section should have clarified the common issues you are likely to encounter when dealing with NULL values, especially as they relate to strings. However, there are still some tricks to learn about dealing with strings, which we cover next.

String Functions

Handling strings in SSIS expressions is different from dealing with string data in SQL Server. The previous section discussed some of the differences with handling NULL values. You also have to pay attention to the Unicode and non-Unicode strings. If a package is moving data between multiple Unicode string sources, you have to pay attention to the code pages between the strings. If you are comparing strings, you also have to pay attention to string padding, trimming, and issues with data truncations. Handling strings is a little involved, but you really only need to remember a few things.

Expression functions return Unicode string results. If you are writing an expression to return the uppercase version of a varchar-type column of data, the result will be a Unicode column with all capital letters. The string function Upper() returns a Unicode string. In fact, SSIS sets all string operations to return a Unicode string. For example, note the date expression in the Derived Column Transformation in Figure 5-11.

FIGURE 5-11

Here you are just adding a string column that includes the concatenation of a date value. The function is using a DatePart() function whose results are cast to a non-Unicode string, but the default data type chosen in the editor is a Unicode string data type. This can be overridden, of course, but it is something to watch for as you develop packages. On the one hand, if the data type is reverted to non-Unicode, then the string has to be converted for each further operation. On the other hand, if the value is left as a Unicode string and the result is persisted in a non-Unicode format, then at some point it has to be converted to a non-Unicode value. The rule of thumb that usually works out is to leave the strings converted as Unicode and then convert back to non-Unicode if required during persistence. Of course, this depends on whether there is a concern about using Unicode data.

Comparing strings requires that you have two strings of the same padding length and case. The comparison is case and padding sensitive. Expressions should use the concatenation operator (+) to get the strings into the same padding style. Typically, this is done when putting together date strings with an expected type of padding, like this:

RIGHT("0" + @Day, 2) + "/" + RIGHT("0" + @Month, 2) + "/" +

RIGHT("00" + @Year, 2)

This type of zero padding ensures that the values in both variables are in the same format for comparison purposes. By padding both sides of the comparison, you ensure the proper equality check:

RIGHT("0" + @Day, 2) + "/"

+ RIGHT("0" + @Month, 2) + "/"

+ RIGHT("00" + @Year, 2)

== RIGHT("0" + @FileDay, 2) + "/"

+ RIGHT("0" + @FileMonth, 2) + "/"

+ RIGHT("00" + @FileYear, 2)

A similar type of padding operation can be used to fill in spaces between two values:

SUBSTRING(@Val1 + " ", 1, 5) + SUBSTRING(@Val2 + " ", 1, 5) +

SUBSTRING(@Val3 + " ", 1, 5)

Typically, space padding is used for formatting output, but it could be used for comparisons. More often than not, spaces are removed from strings for comparison purposes. To remove spaces from strings in expressions, use the trim functions: LTrim(), RTrim(), andTrim(). These functions are self-explanatory, and they enable comparisons for strings that have leading and trailing spaces. For example, comparing the strings “Canterbury” and “Canterbury” return a false unless the expression is written like this:

Trim("Canterbury") == Trim("Canterbury ")

This expression returns true because the significant spaces are declaratively removed. Be careful with these extra spaces in string expressions as well. Spaces are counted in all string functions, which can result in extra character counts for extra spaces when using theLEN() function and can affect carefully counted SUBSTRING() functions that do not expect leading and trailing spaces. If these issues are of importance, employ a Derived Column Transformation to trim these columns early in the Data Flow process.

Conditional Expressions

You use the conditional expression operator to build logical evaluation expressions in the format of an IF..THEN logical structure:

Boolean_expression ? expression_if_true : expression_if_false

The first part of the operator requires a Boolean expression that is tested for a true or false return value. If the Boolean expression returns true, then the first expression after the ternary operator (?) will be evaluated and returned as the final result of the conditional expression. If the Boolean expression returns false, then the expression after the separation operator (:) will be evaluated and returned. Both expressions, as operands, must adhere to one of the following data type rules:

· Both operands must be numeric data types that can be implicitly converted.

· Both operands must be string data types of either Unicode or non-Unicode. Each operand can evaluate to separate types — except for the issue of setting explicit NULL values. In that case, the NULL value for DT_STR non-Unicode NULL values must be cast.

· Both operands must be date data types. If more than one data type is represented in the operands, the result is a DT_DBTIMESTAMP data type.

· Both operands for a text data type must have the same code pages.

If any of these rules are broken, or the compiler detects incompatible data types, you will have to supply explicit casting operators on one or both of the operands to cause the condition expression to evaluate. This is more of an issue as the conditional expression is compounded and nested. A typical troubleshooting issue is seeing an incompatible data type message resulting from a comparison deep in a compound conditional expression. This can be the result of a column that has changed to an incompatible data type, or a literal that has been provided without a suffix consistent with the rest of the expression. The best way to test the expression is to copy it into Notepad and test each piece of the expression until the offending portion is located.

Casting issues can also create false positives. You can see casting truncation in the following example of a Boolean expression comparing the datetimestampoffset and a date value:

(DT_DBDATE) "2014-01-31 20:34:52.123 -3:30" == (DT_DBDATE)"2014-01-31"

Casting converts the expression (DT_DBDATE) "2014-01-31 20:34:52.123-3:30" to "2014-01-31", causing the entire expression to evaluate to true. Date and time conversions are one example of casting issues, but they can occur on any data type that allows forced conversion.

Date and Time Functions

Date and time functions tend to cause confusion for new SSIS developers. In most instances, the different syntax is causing the difficulty. As mentioned earlier, the DatePart() function is a perfect example of this. T-SQL programmers need to double quote the date part portion of the function, or they will see an error similar to this:

The expression contains unrecognized token "dd". If "dd" is a variable then it

should be expressed as "@dd". The specific token is not valid. If the token is

intended to be a variable name, it should be prefixed with the @ symbol.

The fix is simple: put double quotation marks around the date part. A properly formatted DatePart() expression should look like this:

DATEPART( "dd", GETDATE() )

Note that this expression returns the value of the day of the month — for example, 31 if the date is January 31, 2014. A common mistake is to expect this to be the day of the week. You can accomplish that task by changing the date part in the expression like this:

DATEPART( "dw", GETDATE() )

These are just minor adjustments to the SSIS expression language, but they can create some frustration. Another example can be found when attempting to reference the date values in an expression. If you’re used to MS Access date literals, you may be tempted to use something like this:

"SELECT * FROM myTable WHERE myDate >= " + #01/31/2014# (DOESN'T WORK IN SSIS)

That won’t work in SSIS; the # signs are used for a different purpose. If the string is going to be interpreted by SQL Server, just use the single quote around the date:

"SELECT * FROM MYTABLE WHERE MYDATE >= '" + "01/31/2014" + "'"

If the string is just going to be printed, the single quotes aren’t needed. Alternatively, to plug in a date value from a variable, the expression would look like this:

"SELECT * FROM MYTABLE WHERE MYDATE >= '" +

(DT_WSTR, 255)@[System::ContainerStartTime] + "'"

Notice that the value of the date variable must be cast to match the default Unicode data type for all expression strings of DT_WSTR. The problem with simply casting a date to a string is the fact that you get the entire date, which doesn’t translate into what you may want to use as a query parameter. This is clearer if the preceding expression is resolved:

SELECT * FROM MYTABLE WHERE MYDATE >= "02/22/2014 2:28:40 PM'

If your goal is truly to see results only from after 2:28:40 p.m., then this query will run as expected. If items from earlier in the day are also expected, then you need to do some work to parse out the values from the variable value. If the intent is just to return rows for the date that the package is running, it is much easier to create the expression like this (with your proper date style, of course):

"SELECT * FROM MYTABLE WHERE MYDATE >= CONVERT(nvarchar(10), getdate(), 101)"

This method allows SQL Server to do the work of substituting the current date from the server into the query predicate. However, if you need to parse a string from a date value in an expression, take apart one of the following formulas in this section to save you a bit of time:

DESCRIPTION	EXPRESSION
Convert filename with embedded date into the date-time type format: MM/dd/yyyy HH:mm:ss.	SUBSTRING(@[User::FileName],5,2) + "/" +
	SUBSTRING(@[User::FileName],7,2) + "/" +
	SUBSTRING(@[User::FileName],1,4) + "" +
	SUBSTRING(@[User::FileName],9,2) + ":" +
	SUBSTRING(@[User::FileName],11,2) + ":" +
Filename format: yyyyMMddHHmmss	SUBSTRING(@[User::FileName],13,2)
Convert a date-time variable to a filename format of: yyyyMMddHHmmss	(DT_WSTR,4) YEAR(GETDATE()) + RIGHT("0" +
	(DT_WSTR,2) MONTH(GETDATE()), 2) + RIGHT("0" +
	(DT_WSTR,2) DAY( GETDATE()), 2) + RIGHT("0" +
	(DT_WSTR,2) DATEPART("hh", GETDATE()), 2) + RIGHT("0" +
	(DT_WSTR,2) DATEPART("mi", GETDATE()), 2) + RIGHT("0" +
	(DT_WSTR,2) DATEPART("ss", GETDATE()), 2)

This section covered most of the major syntactical issues that new users are likely to encounter with the expression language. The issues that have caused SSIS programmers the most trouble should not be a problem for you. Now you are ready to create some expressions and walk through the process of inserting them into SSIS packages to put them to work.

Using Expressions in SSIS Packages

Creating an expression requires understanding the syntax of the SSIS expression language. As discussed in the previous section, this expression language is part C#, part Visual Basic script, and sometimes some flavor of T-SQL mixed in. Once you can code in the expression language, you are ready to put the expressions to work. This section demonstrates how expressions can be used in SSIS package development, with some typical examples that you can use in your package development tasks.

You can download the packages used as examples in the following sections in their entirety by going to www.wrox.com/go/prossis2014.

Using Variables and Parameters as Expressions

Earlier in this chapter, you learned how to use expressions in variables. A good example of practical usage is to handle the task of processing files in a directory. This task should be familiar to everyone. A directory must be polled for files of a specific extension. If a file is found, it is processed and then copied into a final archival storage directory. An easy way to do this is to hardcode the source and destination paths along with the file extension into the Foreach Loop Container and File System Task. However, if you need to use a failover file server, you have to go through the package and change all these settings manually. It is much easier to use parameters that enable these properties to be set and then use expressions to create the destination directory and filenames. That way, when the server changes, only the parameter needs to change. The basic steps for such an SSIS package can be gleaned from Figure 5-12.

FIGURE 5-12

One of the things to notice in the expression of the BankFileDestinationFile is the namespace named UserExp. While there is an indicator on the variable icon to indicate whether it is an expression, it may behoove you to make the purpose of the variable even clearer using the Namespace column, which provides a nice way to separate variables. In this case, the namespace UserExp indicates that the variable is a user expression-based variable. The namespace UserVar indicates that the variable is defined by the user.

For this package, the Foreach Loop Container Collection tab is set by an expression to retrieve the folder (or directory) from the variable BankFileSourcePath. This variable is statically defined either from configuration processes or manually by an administrator. This tells the Foreach Loop where to start looking for files. To enumerate files of a specific extension, an expression sets the FileSpec property to the value of the variable BankFileExtension, which is also a static variable. Nothing very complicated here except that the properties of the Foreach Loop are set by expressions, rather than of hardcoded values. The container looks like what is shown in Figure 5-13.

FIGURE 5-13

Notice that the Foreach Loop Container is retrieving the filename only. This value is stored in the variable BankFileName. This isn’t shown in Figure 5-13, but it would be shown in the Variable Mappings tab. With the raw filename, no extension, and no path, some variables set up as expressions can be used to create a destination file that is named using the current date. First, you need a destination location. The source folder is known, so use this folder as a starting point to create a subfolder called “archive” by creating a variable named BankFileDestinationFolder that has the property EvaluateAsExpression set to True and defined by this expression:

@[UserVar::BankFileSourcePath] + "\\archive"

You need the escape sequence to properly build a string path. Now build a variable named BankFileDestinationFile that will use this BankFileDestinationFolder value along with a date-based expression to put together a unique destination filename. The expression would look like this:

@[UserExp::BankFileDestinationFolder] + "\\" + (DT_WSTR,4) YEAR(GETDATE())

+ RIGHT("0" + (DT_WSTR,2) MONTH(GETDATE()), 2)

+ RIGHT("0" + (DT_WSTR,2) DAY( GETDATE()), 2)

+ RIGHT("0" + (DT_WSTR,2) DATEPART("hh", GETDATE()), 2)

+ RIGHT("0" + (DT_WSTR,2) DATEPART("mi", GETDATE()), 2)

+ RIGHT("0" + (DT_WSTR,2) DATEPART("ss", GETDATE()), 2)

+ @[UserVar::BankFileExtension]

When evaluated, the expression results in a destination filename that looks like c:\BankFileSource\Archive\20140101154006.txt when the bank file destination folder is c:\BankFileSource\Archive. By using variables that evaluate to the value of an expression, combined with information set statically from administrator and environmental variables like the current date and time, you can create packages with dynamic capabilities.

Another best practice is to use expression-based variables to define common logic that you’ll use throughout your SSIS package. If in the preceding example you wanted to use this date-based string in other places within your package, you could define the date portion of that expression in a separate variable called DateTimeExpression. Then the expression for the BankFileDestinationFolder variable could be simplified to look like this:

@[UserExp::BankFileDestinationFolder] + "\\" + @[UserExp::DateTimeExpression] +

@[UserVar::BankFileExtension]

The power in separating logic like this is that an expression need not be buried in multiple places within an SSIS package. This makes maintenance for SSIS packages much easier and more manageable.

Using Expressions in Connection Manager Properties

Another simple example of using expressions is to dynamically change or set properties of an SSIS component. One of the common uses of this technique is to create a dynamic connection that enables packages to be altered by external or internal means. In this example, assume a scenario in which all logins are duplicated across environments. This means you need to change only the server name to make connections to other servers.

To start, create a variable named SourceServerNamedInstance that can be used to store the server name to which the package should connect for source data. Then create any connection manager in the Connection Managers section of the SSIS package and press F4, or right-click and select Properties to get to the Properties window for the connection object. The Properties window should look like Figure 5-14.

FIGURE 5-14

The secret here is the Expressions Collection property. If you click this property, an ellipsis will be displayed. Clicking that button will bring up the Property Expressions Editor shown in Figure 5-15, where you can see the properties that can be set to an expression, and ultimately do so in the Expression Builder.

FIGURE 5-15

This example completes the demonstration by setting the property of the ServerName to the expression that is simply the value of the SourceServerNamedInstance variable. Here you affected only one property in the connection string, but this is not the only option. The entire connection string, as you may have noticed in the Property drop down, can be set by a string-based expression. This same technique can be used to set any connection property in the Data Flow components as well to dynamically alter the flat file and MS Excel-based connections. A common use is to set the connection source for a Data Flow component to a variable-based incoming filename.

Using Expressions in Control Flow Tasks

A common example of using expressions in Control Flow Tasks is to create SQL statements dynamically that are run by the Execute SQL Task. The Execute SQL Task has a property called SQLStatement that can be set to a file connection, a variable, or direct input. Instead of creating parameterized SQL statements that are subject to error and OLE provider interpretation, you can try building the SQL statement using an expression and putting the whole SQL statement into the SQLStatement property. This section walks through an example like this using a DELETE statement that should run at the start of a package to delete any data from a staging table that has the same RunJobID (a theoretical identifier for a unique SSIS data load).

To start, create one variable for the DELETE statement that doesn’t include the dynamic portion of the SQL statement. A variable named DeleteSQL of type String would be set to a value of the string:

DELETE FROM tblStaging WHERE RunJobId =

Create another variable named DeleteSQL_RunJobId with the data type of Int32 to hold the value of a variable RunJobId. This value could be set elsewhere in the SSIS package.

In the Execute SQL Task, bring up the editor and ensure that SQLSourceType is set to DirectInput. You could also set this value to use a variable if you built the SQL statement in its entirety within an expression-based variable. In this example, you’ll build the expression in the Execute SQL Task. To get there, click the Expressions tab of the editor to view the Expressions collections property. Click the ellipses to access the Property Expressions Editor and build the SQL statement using the two variables that you defined. Make sure you use the casting operator to build the string like this:

@[UserVar::DeleteSQL] + (DT_WSTR, 8) @[UserVar::DeleteSQL_RunJobId]

The completed Execute SQL Task Property Expressions Editor will look like Figure 5-16.

FIGURE 5-16

When the package runs, the expression will combine the values from both the variables and construct a complete SQL statement that will be inserted into the property SqlStatementSource for the Execute SQL Task. This technique is more modular and works more consistently across the different OLE DB providers for dynamic query formation and execution than hardcoding the SQL statement. With this method it is possible to later define and then reconstruct your core SQL using initialization configurations. It is also a neat technique to show off your command of expressions and variables.

Using Expressions in Control Flow Precedence

Controlling the flow of SSIS packages is one of the key strengths of dynamic package development. Between each Control Flow Task is a precedence constraint that can have expressions attached to it for evaluation purposes. You can visually identify the precedence constraint as the arrow that connects two Control Flow Tasks. During runtime, as one Control Flow Task completes, the precedence constraint is evaluated to determine if the flow can continue to the next task. One common scenario is a single package that may have two separate sequence operations that need to occur based on some external factor. For example, on even days one set of tasks runs, and on odd days another separate set of tasks runs. A visual example of this type of package logic is shown in Figure 5-17.

FIGURE 5-17

This is an easy task to perform by using an expression in a precedence constraint. To set this up, define a Boolean variable as an expression called GateKeeperSequence. Make sure the variable is in the namespace UserExp to indicate that this variable is an expression-based variable. Set the expression to this formula:

DATEPART( "dd", GetDate() ) % 2

This expression takes the current day of the month and uses the modulus operator to leave the remainder as a result. Use this value to test in the precedence constraint to determine which sequence to run in the package. The sequence on even days should be run if the GateKeeperSequence returns 0 as a result, indicating that the current day of the month is evenly divisible by two. Right-click the precedence constraint and select Edit to access the editor, and set it up to look like Figure 5-18.

FIGURE 5-18

The expression @[UserExp::GateKeeperSequence]==0 is a Boolean expression that tests the results of the first expression to determine whether the value is equal to zero. The second sequence should execute only if the current day is an odd day. The second precedence constraint needs an expression that looks like this:

@[UserExp::GateKeeperSequence]!=0

By factoring the first expression into a separate expression-based variable, you can reuse the same expression in both precedence constraints. This improves the readability and maintenance of your SSIS packages. With this example, you can see how a package can have sections that are conditionally executed. This same technique can also be employed to run Data Flow Tasks or other Control Flow Tasks conditionally using Boolean expressions. Refer back to the section “Boolean Expressions” if you need to review some other examples.

Using the Expression Task

Recently introduced in SQL Server 2012, the Expression Task enables you to set the value of variables in the Control Flow. If you are thinking, “But I could do that before!” you are absolutely correct. In previous editions of Integration Services, you could change a variable’s value by using the EvaluateAsExpression property or by using a Script Task, as previously described in this chapter.

The beauty of the Expression Task is that it specifically calls out when in the package this change occurs. This means that you can make a similar change multiple times, if you placed an Expression Task in a Foreach Loop Container, or you can make it easier to see that the variable you are using is changed. The following example shows how to use an Expression Task to make an iterator variable that can count the number of times you loop through a section of the package.

Begin by using and editing an Expression Task from the SSIS Toolbox. The Expression Builder that appears limits you to variables and parameters because you are in the Control Flow designer. You assign a value to a variable by using the equals sign, so the final formula will look like this:

@[User::Iterator] = @[User::Iterator] + 1

A completed version of the property window is shown in Figure 5-19.

FIGURE 5-19

Using Expressions in the Data Flow

Although you can set properties on some of the Data Flow components, a typical use of an expression in a Data Flow is to alter a WHERE clause on a source component. In this example, you’ll alter the SQL query in a source component using a supplied date as a variable to pull out address information from the AdventureWorks database. Then you’ll use an expression to build a derived column that can be used for address labels.

First, set up these variables at the Data Flow scope level by selecting the Data Flow Task before creating the variable:

VARIABLE NAME	DATA TYPE	NAMESPACE	DESCRIPTION
BaseSelect	String	UserVar	Contains base Select statement
SelectSQL_UserDateParm	DateTime	UserVar	Contains supplied date parm
SelectSQL	String	UserExp	Derived SQL to execute
SelectSQL_ExpDateParm	String	UserExp	Safe Date Expression

Notice that the namespaces for the BaseSelect and SelectSQL_UserDateParm variables are using a namespace UserVar. As described previously, you use them because it makes it clear which variables are expression-based and which are not. Provide the following values for these variables:

VARIABLE NAME	VALUE
BaseSelect	SELECT AddressLine1, AddressLine2
	City, StateProvinceCode, PostalCode
	FROM Person.Address adr
	INNER JOIN Person.StateProvince stp
	ON adr.StateProvinceID = stp.StateProvinceID
	WHERE adr.ModifiedDate >=
SelectSQL_UserDateParm	1/12/2000

Note that you need to put the value from the BaseSelect variable into one continuous line to get it all into the variable. Make sure the entire string is in the variable value before continuing.

The remaining variables need to be set up as expression-based variables. At this point, you should be proficient at this. Set the EvaluateAsExpression property to True and prepare to add the expressions to each. Ultimately, you need a SQL string that contains the date from the SelectSQL_UserDateParm, but using dates in strings by just casting the date to a string can produce potentially unreliable results — especially if you are given the string in one culture and you are querying data stored in another collation. This is why the extra expression variable SelectSQL_ExpDateParm exists. This safe date expression looks like this:

(DT_WSTR, 4) DATEPART("yyyy", @[UserVar::SelectSQL_UserDateParm]) + "-" +

(DT_WSTR, 2) DATEPART("mm", @[UserVar::SelectSQL_UserDateParm]) + "-" +

(DT_WSTR, 2) DATEPART("dd", @[UserVar::SelectSQL_UserDateParm]) + " "+

(DT_WSTR, 2) DATEPART("hh", @[UserVar::SelectSQL_UserDateParm]) + ":" +

(DT_WSTR, 2) DATEPART("mi", @[UserVar::SelectSQL_UserDateParm]) + ":" +

(DT_WSTR, 2) DATEPART("ss", @[UserVar::SelectSQL_UserDateParm])

The expression parses out all the pieces of the date and creates an ISO-formatted date in a string format that can now be appended to the base SELECT SQL string. This is done in the last expression-based variable SelectSQL. The expression looks like this:

@[UserVar::BaseSelect] + "'" + @[UserExp::SelectSQL_ExpDateParm] + "'"

With all the pieces to create the SQL statement in place, all you need to do is apply the expression in a data source component. Drop an OLE DB Source component connected to the AdventureWorks database on the Data Flow surface, and set the Data access mode to retrieve the data as “SQL command from variable.” Set the variable name to the SelectSQL variable. The OLE DB Source Editor should look like Figure 5-20.

FIGURE 5-20

Click the Preview button to look at the data pulled with the current value of the variable SelectSQL_UserDateParm. Change the value and check whether the data changes as expected. Now the OLE DB source will contain the same columns, but the predicate can be easily and safely changed with a date parameter that is safe across cultures.

The final task is to create a one-column output that combines the address fields. Add a Derived Column Transformation to the Data Flow, and a new column of type WSTR, length of 2000, named FullAddress. This column will need an expression that combines the columns of the address to build a one-column output. Remember that we are dealing with Data Flow data here, so it is possible to realize a NULL value in the data stream. If you simply concatenate every column and a NULL value exists anywhere, the entire string will evaluate to NULL. Furthermore, you don’t want addresses that have blank lines in the body, so you only want to add a newline character conditionally after addresses that aren’t NULL. Because the data tables involved can only contain NULL values in the two address fields, the final expression looks like this:

(ISNULL(AddressLine1) ? "" : AddressLine1 + "\n") +

(ISNULL(AddressLine2) ? "" : AddressLine2 + "\n") +

City + ", "+ StateProvinceCode + " " + PostalCode

The Derived Column Transformation should look similar to Figure 5-21.

FIGURE 5-21

Running this example will create the one-output column of a combined address field that can be dynamically configured by a date parameter with a conditional Address2 line, depending upon whether the data exists. Using expressions to solve problems like this makes SSIS development seem almost too easy.

SUMMARY

The aim of this chapter was to fill any gaps in your understanding of expressions. Clearly, this feature is powerful; it enables dynamic package development in an efficient way, getting you out of the code and into productive solutions. However, expressions can be frustrating if you don’t pay attention to the data types and whether you are working with data in variables or in the Data Flow. This chapter has described the “gotchas” that typically plague SSIS developers, along with their solutions, so that you don’t have to experience them. Along the way, we consolidated the common questions, answers, and best practices we’ve learned about using expressions, making them available to you in one chapter.

We discussed how you can set variables programmatically and use scripting tasks and components to further the SSIS dynamic package capabilities. We also covered how to use variables and parameters in expressions to create dynamic properties. There are still scenarios in which expressions don’t fit the bill, and for these scripting tasks can be used to save the day. Stay tuned for Chapter 9, where you explore scripting tasks both in the control and Data Flow roles, and expand your SSIS capabilities. The next chapter will switch over to discuss the Control Flow, specifically containers.