SSIS Crib Notes - Professional Microsoft SQL Server 2014 Integration Services (Wrox Programmer to Programmer), 1st Edition (2014)

Professional Microsoft SQL Server 2014 Integration Services (Wrox Programmer to Programmer), 1st Edition (2014)

Appendix A. SSIS Crib Notes

In this appendix you find a list of the most commonly used expressions, tasks, and transforms in SSIS with a description of when to use them. Reference these tables when you have a package to build in SSIS, and you are not sure which SSIS component to use to perform the needed actions.

WHEN TO USE CONTROL FLOW TASKS

TASKS

WHEN TO USE

Data Flow Task

Use this task when you need to pass data from a source to a destination. The source and destination can be a flat file, an OLE DB Connection, or any other connections supported in the connection manager.

Execute Package Task

Use this task when you need to call another package from within a package. The package performing the call is the parent package. The called package is the child package. Information can be passed from the parent package to the child package with configurations.

Execute Process Task

Use this task to call an executable. The executable can be a batch file or an application. This task can call applications to perform functions on the files in SSIS, such as compressing a file. This task is commonly used to call third-party programs like compression or FTP tools.

Execute SQL Task

Use this task to perform any T-SQL operation. The SQL can be saved directly in the task, in a file, or in a variable. This task is commonly used to call stored procedures.

File System Task

Use this task to manipulate files. This task can move, rename, copy, and delete files and directories. You can also change the attributes of a file. A common use is archiving files after loading them.

FTP Task

Use this task to send or receive a file via the FTP protocol. You must have a valid FTP connection to perform this task. This task is commonly used to receive files from an FTP host for loading in a database.

Message Queue Task

Use this task to send or receive messages to a message queue. You must have a valid MSMQ connection to perform this task.

Script Task

Use this task to perform complex tasks that are not available in SSIS. This task allows you to leverage the .NET Framework to perform just about any task. Checking for the existence of a file is common use of this task.

Send Mail Task

Use this task to send e-mail via SMTP. You must have a valid SMTP server connection to use this task. You can use this task to send notification of the package information to recipients. You can also send files via the attachments on the e-mail.

Web Service Task

Use this task to call a web service. You need a valid web service URL to perform this task.

XML Task

Use this task to perform XML functions. This task can perform common XML tasks such as Diff, used to compare two XML files and find the differences.

WHEN TO USE DATA FLOW TRANSFORMS

TRANSFORMS

WHEN TO USE

Aggregate

Use this transform to perform grouping and summing of data. This is similar to the Group By function in T-SQL.

Audit

Use this transform to add a column to a Data Flow with package information. You can add items like the package name and user name as a new column in the Data Flow.

Conditional Split

Use this transform to divide data into different paths based on a Boolean expression. You can use all the paths from the split or ignore some outputs. This transform equates to a CASE statement in T-SQL.

Copy Column

Use this transform to create a new column in the Data Flow that is an exact copy of another column.

Data Conversion

Use this transform to convert data from one data type to another. For example, you can change Unicode to non-Unicode or change a string to an integer.

Derived Column

This is the most important transform. Use this transform to create or replace a column in the Data Flow with a column created by an expression. You can combine columns or use functions like getdate to create new data.

Export Column

Use this transform to send a column in a Data Flow to a file. The data types can be DT_TEXT, DT_NTEXT, and DT_IMAGE.

Fuzzy Grouping

Use this transform to group data together based on a percentage match. In this transform the data does not have to be an exact match to be grouped together. You can control the percentage of matching needed to group the data.

Fuzzy Lookup

Use this transform to find matching data in a table. The data does not have to match exactly. You can control the percentage of matching needed to group the data.

Import Column

Use this transform to import data from files into rows in a data set.

Lookup

Use this transform to compare data in a Data Flow to a table. This will find exact matches in the date and give you a match and no-match output from the transform. This transform equates to an INNER JOIN statement in T-SQL.

Merge

Use this transform to combine two sets of data in a way similar to a Union All. This transform requires both inputs to be sorted.

Merge Join

Use this transform to combine two sets of data in a way similar to a left outer join. This transform requires both inputs to be sorted.

Multicast

Use this transform to clone the data set and send it to different locations. This transform does not alter the data.

OLE DB Command

Use this transform to send T-SQL commands to a database. This can be used to insert data into a table using the T-SQL Insert command.

Percentage Sampling

Use this transform to select a percentage of the rows in a Data Flow. The rows are randomly selected. You can set a seed to select the same rows on every execution of the transform. The unselected rows will follow a different path in the Data Flow.

Pivot

Use this transform to convert normalized data to denormalized data. This transform changes the rows into columns.

Row Count

Use this transform to write the row count in a Data Flow to a variable.

Row Sampling

Use this transform to select a number of rows in the Data Flow. The number of rows is set in the transform. The unselected rows will follow a different path in the Data Flow.

Script Component

Use this transform to perform complex transforms that are not available in SSIS. This transform allows you to leverage the .NET Framework to perform just about any transform.

Slowly Changing Dimension

Use this transform to create a dimension load for a data warehouse. This is a wizard that will walk you through all the decision making in setting up a dimensional load.

Sort

Use this transform to order the data by a column or more than one column. This is similar to an “order by” command in T-SQL.

Term Extraction

Use this transform to find words in a Data Flow and create an output with the words listed and a score.

Term Lookup

Use this transform to compare to data in a Data Flow and determine if a word exists in the data.

Union All

Use this transform to combine two sets of data on top of each other. This is similar to the Union command in T-SQL.

Unpivot

Use this transform to convert denormalized data to normalized data. This transform changes the columns into rows.

COMMON EXPRESSIONS AND SCRIPTS

PROBLEM

QUICK SOLUTION

Loop over a list of files and load each one.

Tasks Required: Foreach Loop, Data Flow Task

Solution: Configure the Foreach Loop to loop over any particular directory of files. The loop should be configured to output to a given variable. Map the given variable to a connection manager by using expressions.

More of this can be found in Chapter 6.

Conditionally executing tasks

Solution: Double-click the precedence constraint and set the Evaluation property to Expression and Constraint. Type the condition that you want to evaluate in the Expression box.

Move and rename the file at the same time.

Tasks Required: File System Task

Solution: Set the File System Task to rename the file and point to the directory you’d like to move it to. This enables you to rename and move the file in the same step.

More on this can be found in Chapter 3.

Loop over an array of data in a table and perform a set of tasks for each row.

Tasks Required: Execute SQL Task, Foreach Loop

Solution: Use an Execute SQL Task to load the array and send the data into an object variable. Loop over the variable in a Foreach Loop by using an ADO Enumerator.

More of this can be found in Chapter 6.

Perform an incremental load of data.

Tasks Required: Two Execute SQL Tasks, Data Flow Task

Solution: Have the first Execute SQL Task retrieve a date from a control table of when the target table was last loaded and place that into a variable. In the Data Flow Task, create a date range on your query using the variable. Then, update the control table using a second Execute SQL Task to specify when the table was last updated. You can also use the Change Data Capture (CDC) components built into SQL Server and SSIS to simplify this if you have Enterprise Edition of SQL Server installed.

More on this can be found in Chapter 11.

Perform a conditional update and insert.

Components Required: Data Flow Task, Conditional Split, Lookup Transform or Merge Join, OLE DB Command Transform

Solution: Use the Lookup Transform or Merge Join to determine if the row exists on the destination and ignore a failed match. If the row yields blank on the key, then you know the row should be inserted into target (by a Conditional Split). Otherwise, the row is a duplicate or an update. Determine if the row is an update by comparing the source value to the target value in the Conditional Split. The update can be done by an OLE DB Command Transform or by loading the data into a staging table.

More of this can be found in Chapter 13.

Create a file name with today’s date.

Expression on the Flat File or File Connection Manager:
"C:\Projects\MyExtract" + (DT_WSTR, 30) (DT_DBDATE)GETDATE() + ".csv"

Results in:
C:\Projects\MyExtract2014-03-20.csv

Use a two-digit date. For example, retrieve a month in two-digit form (03 for March instead of 3).

RIGHT("0"+(DT_WSTR,4)MONTH(Getdate()),2)

Results in:
03 (if the month is March)

Multiple condition if statement. In this example, the statement determines that if the ColumnName column is blank or null, it will be set to unknown. To make a Logical AND condition, use && instead of the || operator.

ISNULL(ColumnName) || TRIM(ColumnName)== "" ? "Unknown" : ColumnName

Return the first five characters from a zip code.

Derived Column Transform in the Data Flow:
SUBSTRING(ZipCodePlus4,1,5)

Remove a given character from a string (example shows how to remove dashes from a Social Security number).

Derived Column Transform in the Data Flow:
REPLACE(SocialSecurityNumber, "-","")

Uppercase data

Derived Column Transform in the Data Flow:
UPPER(ColumnName)

Replace NULL with another value.

Derived Column Transform in the Data Flow:
ISNULL(ColumnName) ? "New Value": ColumnName

Replace blanks with NULL values.

Derived Column Transform in the Data Flow:
TRIM(ColumnName) == "" ? (DT_STR,4,1252)NULL(DT_STR,4,1252) : ColumnName

Remove any non-numeric data from a column.

Script Transform in the Data Flow Task with the code as follows:
Imports System.Text.RegularExpressions
Public Overrides Sub
Input0_ProcessInputRow(ByVal Row As Input0Buffer)
If Row.ColumnName_IsNull = False Or Row.ColumnName = ""
Then
Dim pattern As String =
String.Empty
Dim r As Regex = Nothing
pattern = "[^0-9]"
r = New Regex(pattern,
RegexOptions.Compiled)
Row.ColumnName =
Regex.Replace(Row.ColumnName, pattern, ""
End If
End Sub

Convert text to proper case (first letter in each word uppercase).

Script Transform with the line of partial code as follows (note that this code should go on one line):
Row.OutputName = StrConv(Row.InputName, VbStrConv.ProperCase)