OData Source - SQL Server Integration Services Design Patterns, Second Edition (2014)

SQL Server Integration Services Design Patterns, Second Edition (2014)

Chapter 12. OData Source

The Open Data (OData) protocol is a data access protocol that provides a standardized way to create and consume data APIs via the web. The technology builds upon common protocols and methodologies, such as HTTP, REST, AtomPub, and JSON. The initial versions were defined by Microsoft as open standard, and OData has since been picked up by an OASIS committee with sponsors from many industry giants, such as IBM, Red Hat, and SAP AG. OASIS approved the latest version of the standard (Version 4) in February 2014.

Microsoft has embraced OData as the primary way to expose data APIs for many of its products. SharePoint, Project Online, and many Azure services expose OData APIs for retrieving data, and performing CRUD operations. WCF Data Services (previously known as ADO.NET Data Services) provides a quick and easy way for anyone to developer their own OData services based on an Entity Data Model (EDM) and the ADO.NET Entity Framework. Given the proliferation of the technology, it made sense for Microsoft to add support for OData within SSIS.

This chapter describes the newly added OData Source for SSIS.

Image Note The OData Source is not included with the SQL Server 2014 or SQL Server 2012 installation package and must be downloaded separately. The SQL Server 2012 version is available as a download from the Microsoft Download Center: www.microsoft.com/en-us/download/details.aspx?id=42280. The SQL Server 2014 version is part of the SQL Server 2014 feature pack, which can be found at www.microsoft.com/en-us/download/details.aspx?id=42295.

Understanding the OData Protocol

The OData protocol is a fairly extensive protocol and exposes a wide range of capabilities. This section describes the key features of the OData protocol that an SSIS developer should be aware of. If you are interested in learning more about the protocol, the full specification and documentation is available from the www.odata.org web site.

Image Note The SSIS OData Source for SQL Server 2012 and 2014 supports version 3 of the OData protocol, and not version 4. More information about the OData protocol can be found at www.odata.org/.

There are two main features of OData from an SSIS perspective: the metadata document and the data you get back from accessing an OData resource.

The metadata document is an EDM description of the OData service that describes the entity sets exposed by the service. The SSIS OData Source uses the metadata document to figure out the schema of the source data. This document is located at a standard location for each service—http://<base_url>/$metadata. Table 12-1 shows a mapping of OData and EDM concepts.

Table 12-1. OData v3 Concepts Mapped to Relational Terms

OData

Relational

Comments

Entity set

Table or view

Entity sets (also known as collections or feeds) define the objects you’ll be receiving in SSIS. When using the Collection mode, the OData Source provides a drop-down box allowing you to select one of the entity sets defined by the service.

Entity

Row of data

Defines the schema for the rows of data within an entity set.

Operation

Stored procedure or function

The OData Source UI does not expose a way to select operations, but it is possible to invoke an operation by specifying its resource path.

Navigation property

Foreign key relationship

Navigation properties define relationships between entities. The OData Source does not support these directly, but you can access subentities by entering their resource path.

At runtime, SSIS retrieves data from the OData service using standard HTTP URLs. That means you can see the same data being returned to SSIS by entering the URL into a web browser, which can be very useful for debugging purposes. The OData Source has two modes: when accessing the service using the Collection mode, SSIS will automatically generate the resource URL for you; when using the Resource Path mode, you will supply the full URL.

Data Type Mappings

The OData Source will attempt to map the fields of complex types defined within the service’s metadata document to SSIS data flow types. When configuring the OData Source, the fields in the entity or resource collection or entity set you reference will be added as external metadata columns for the component. Table 12-2 shows the EDM to SSIS data type mappings. Given the flexibility of the Entity Data Model, you may find that certain complex types will not work with the SSIS OData Source.

Table 12-2. EDM Data Type Mapping to SSIS Types

EDM Type

CLR Type

SSIS Type

Edm.Binary

byte[]

DT_BYTES

Edm.Boolean

bool

DT_BOOL

Edm.DateTime

DateTime

DT_DBTIMESTAMP

Edm.DateTimeOffset

DateTimeOffset

DT_DBTIMESTAMPOFFSET

Edm.Decimal

decimal

DT_NUMERIC

Edm.Double

double

DT_R8

Edm.Guid

Guid

DT_GUID

Edm.Int16

Int16

DT_I2

Edm.Int32

Int32

DT_I4

Edm.Int64

Int64

DT_I8

Edm.SByte

sbyte

DT_I1

Edm.Single

float

DT_R4

Edm.String

string

DT_WSTR

Edm.Time

TimeSpan

DT_DBTIME2

Query Options

OData supports multiple query options that can be included with a request. Query options start with a dollar sign ($) and are added to the query string portion of the URL, following the resource path. These options are typically used to pass filters to the service or to limit the number of rows or columns in the result set. Some OData services may not support all of the available query options, so sometimes you will need to experiment. Table 12-3 shows some of the more commonly used query options you will use when working with SSIS. You can use the query options in SSIS by specifying them in the Query Options box in the OData Source Editor, or by directly setting the Query property value on the component.

Table 12-3. OData Query Options

Option

Description

$select

Comma-separated list used to specify which columns should be returned by the service. Equivalent to the SELECT statement in T-SQL.

$filter

Restricts the result set to rows that match the specified filter expression. Equivalent to the WHERE clause in T-SQL.

$orderby

Specifies the order in which results are returned from the service. Equivalent to the ORDER BY clause in T-SQL.

$top

Limits the number of rows returned from the service. Equivalent to the TOP expression in T-SQL.

$skip

Indicates that the service should ignore the first N rows of the result set, where N is specified by the $skip argument.

$format

This option allows you to override the default format of the response. Valid options are json, atom, and xml. SSIS will try to use the json format by default, since it is the most concise and tends to offer the best performance (especially when the service supports JSON with minimal metadata—also known as JSON light).

Configuring the OData Connection Manager

The OData Connection Manager is where you specify the service document URL (i.e., the root URL of the OData service) and authentication information. The connection manager supports three authentication types: Windows, Basic (username and password), and Microsoft Online Services Authentication. You can configure Windows and Basic authentication settings on the main OData Connection Manager Editor page (shown in Figure 12-1). Microsoft Online Services Authentication is meant to be used with SharePoint Online and requires some additional configuration. The process is described in the following section.

9781484200834_Fig12-01.jpg

Figure 12-1. The OData Connection Manager Editor

Enabling Microsoft Online Services Authentication

SharePoint provides OData endpoints for the objects that it stores, such as its lists and documents, which can now be accessed in SSIS through the OData Source. Although you can configure different types of authentication for the on-premises versions of SharePoint, SharePoint Online (and other Microsoft services that are a part of Office 365, like Project Online) uses a special type of authentication. The OData Connection Manager supports these online services, but you may need to install an additional SharePoint SDK component to make it work.

You can follow these steps to connect to SharePoint Online:

1. Click the All button in the OData Connection Manager Editor (shown in Figure 12-1) to bring up the full property page.

2. Under the Security group, change the Integrated Security property from SSPI to False.

3. If the Microsoft Online Services Authentication property stays disabled, then you will need to install the SharePoint Service 2013 Client Components SDK from the Microsoft Download Center (search for the SDK, or follow this link:http://www.microsoft.com/en-us/download/details.aspx?id=35585). After installing the SDK, save the SSIS package and restart SQL Server Data Tools. You should now be able to change the Microsoft Online Services Authentication property to True (as shown in Figure 12-2).

9781484200834_Fig12-02.jpg

Figure 12-2. The OData Connection Manager Editor showing all properties

4. Enter the user ID and password you use when connecting to SharePoint Online. You may also need to provide a domain value, depending on how your site administrator has configured your Active Directory integration with Office 365.

5. You should now be able to successfully connect to your SharePoint Online site. Enter the Service Document URL (e.g., https://<mysite>.sharepoint.com/_vti_bin/ListData.svc) and click the Test Connection button.

Image Note More information about the OData endpoints offered by SharePoint Online can be found in Books Online at http://msdn.microsoft.com/en-us/library/ff521587.aspx.

Configuring the Source Component

You can bring up the OData Source Editor by double-clicking on the component in the data flow. The editor is similar to what you’d expect from other SSIS data flow components and is shown in Figure 12-3.

9781484200834_Fig12-03.jpg

Figure 12-3. The OData Source Editor

A description of the fields you can configure on the Connection page can be found in Table 12-4. The table lists the field name as it is described in the editor UI, as well as the matching property name when setting the value through the Advanced Editor (shown later in Figure 12-6) or the Properties window.

Table 12-4. OData Source Fields

Field

Property

Description

OData Connection Manager

N/A

The connection manager you will use for this Source component. This property must be set before you can set the other values. Clicking the New button will bring up a dialog that lets you create a new connection manager.

Use Collection or Resource Path

UseResourcePath

This field determines whether you are accessing a specific collection from the OData service or providing your own resource path. This is set to Collection by default. More information about these access modes can be found later in this section.

Collection

CollectionName

This box lets you select a specific collection from the OData feed.

Resource Path

ResourcePath

When using the Resource Path access mode, this field lets you enter a full path of the specific OData resource you want to access.

Query Options

Query

This field lets you append different query options, such as $top or $filter, to your request. Table 12-3 lists the set of query options you’d commonly include in a request. Note that the values should be URL encoded—the OData Source does not automatically encode these values for you.

Feed Url

N/A

This is a read-only field that shows the URL that the OData Source will use to access the server. It is built based on the values you set for the other fields in this dialog. This field can be useful for debugging purposes.

There are two main ways to access the OData service using the OData Source: by selecting a collection, or by configuring a resource path.

Accessing the service using a collection is the default behavior of the OData Source. You will typically use this option when you want to read one of the top level entity sets exposed by the service. The Collection field will give you a drop-down list of the available entity sets, similar to how the OLE DB Source lists the set of available tables and views.

You would use the resource path when you are doing more than just accessing a single entity set. For example, you want to run an action or operation defined by the service, or you want to access a subentity through a navigation property. In these cases, you would provide the full URL in the Resource Path field. Note that you are expected to apply any URL encoding to the value yourself— the component does not automatically do this for you. Also note that the OData Source will not do any special validation for these values—it will simply try accessing the URL at runtime and rely on the service’s $metadata document to determine how to interpret the results. Using the Preview button on the OData Source Editor dialog (shown in Figure 12-4) is very helpful in these cases because it will let you quickly validate whether the request will succeed and that you are getting the data you want.

9781484200834_Fig12-04.jpg

Figure 12-4. The Preview dialog from the OData Source Editor

After configuring the collection or resource path value, you can select the columns you want to retrieve for your data flow by clicking on the Columns page (shown in Figure 12-5). This dialog uses the standard SSIS external column dialog and should be familiar to SSIS developers. Note that unselected columns on this page won’t be added to your data flow, but they will still be pulled down by the request to the service. If you are only selecting a few columns from a given feed, you might want to limit the columns upfront using the $select query option. More information about query options can be found in Table 12-3.

9781484200834_Fig12-05.jpg

Figure 12-5. The OData Source Editor’s Columns page

Overriding Data Types

The OData Source allows you to manually override data types on output columns for certain types. This comes in handy when you are working with OData services that don’t specify max length values for String and Binary fields. SSIS will map to DT_NTEXT and DT_IMAGE types (respectively), which can have negative implications to performance. You can change the default data type for a column by using the Advanced Editor (as shown in Figure 12-6).

9781484200834_Fig12-06.jpg

Figure 12-6. The Advanced Editor for the OData Source

Here’s how to modify the default data types:

1. Add an OData Source to your data flow and configure its connection manager and the data you want to access.

2. Click OK to save the settings.

3. Right-click on the OData Source component on the data flow’s design surface and select Show Advanced Editor.

4. Select the Input and Output Properties tab.

5. Expand the Output folder.

6. Expand the Output Columns folder.

7. Select the column you want to modify.

8. Select the DataType property.

9. Change the value to the new SSIS data type and the Length value, if appropriate, for the type.

You will only be able to change the DataType property for certain types. Table 12-5 contains a list of data types that are interchangeable. Changing the value for an unsupported data type will result in an error message (COMException 0xC020837D).

Table 12-5. Data Types That Can Be Changed

Data Type

Compatible Type

DT_NTEXT

DT_WSTR

DT_IMAGE

DT_BYTES

DT_DBTIMESTAMP2

DT_DBTIMESTAMP

Conclusion

This chapter described the new OData Source component in SSIS. OData has seen a steady increase in adoption over the past few years, and the OData Source provides an out-of-the-box solution for reading data from a variety of sources, such as SharePoint, and the Windows Azure Storage APIs.