Microsoft Business Intelligence Tools for Excel Analysts (2014)
PART I: Leveraging Excel for Business Intelligence
Chapter 4: Loading External Data into Power Pivot
In This Chapter
· Importing from relational databases
· Importing from flat files
· Loading data from other data sources
· Refreshing and managing external data connections
In Chapter 3, you loaded the data already contained within the workbook in which you're working. But as you discover in this chapter, you’re not limited to using only the data that already exists in your Excel workbook.
Power Pivot has the ability to reach outside the workbook and import data found in external data sources. Indeed, what makes Power Pivot so powerful is its ability to consolidate data from disparate data sources and build relationships among them. This means you can theoretically create a Power Pivot Data Model that contains some data from an SQL Server table, some data from a Microsoft Access database, and even data from a one-off text file.
In this chapter, we show you how to import external data into your Power Pivot Data Models.
Loading Data from Relational Databases
Relational databases are one of the more common data sources used by Excel analysts. It’s not difficult to find an analyst who frequently uses data from Microsoft Access, SQL Server, or Oracle databases. In this section, we walk through the steps for loading data from external database systems.
Loading data from SQL Server
SQL Server databases are some of the most commonly used for the storing of enterprise-level data. Most SQL Server databases are managed and maintained by the IT department. To connect to an SQL Server database, you have to work with your IT department to obtain read access to the database you’re trying to pull from.
After you have access to the database, follow these steps to load the data:
1. Open the Power Pivot window and click the From Other Sources button on the Home tab.
2. In the Table Import Wizard, shown in Figure 4-1, select the Microsoft SQL Server option and then click Next.
Figure 4-1: Select Microsoft SQL Server to start loading data.
The Table Import Wizard asks for the information it needs to connect to your database (see Figure 4-2).
3. Fill in the following fields. When you're done, click Next.
· Friendly Connection Name: This field allows you to specify your own name for the external source. Enter a name that is descriptive and easy to read.
· Server Name: This is the name of the server that contains the database you're trying to connect to. You get this from your IT department when you're given access.
Figure 4-2: Provide the basic information needed to connect to the target database.
· Log on to the Server: These are your login credentials. Depending on how your IT department gives you access, you select either Use Windows Authentication or Use SQL Server Authentication. Windows Authentication essentially means that the server recognizes you by your windows login. SQL Server Authentication means that the IT department created a distinct username and password for you. If you're using SQL Server Authentication, you need to provide a username and password.
· Save My Password: Select the Save My Password check box if you want your username and password to be stored in the workbook. This allows your connections to remain refreshable when being used by other people. There are obviously security issues with this option, because anyone can view the connection properties and see your username and password. You should only use this option if your IT department set you up with an application account — that is, an account created specifically to be used by multiple people.
· Database Name: Every SQL Server can contain multiple databases. Enter the name of the database you are connecting to. You get this from your IT department when you are given access.
4. In the Choose How to Import the Data screen, shown in Figure 4-3, choose whether to select from a list of tables and views, or write your own custom query using SQL syntax. Click Next when done.
Figure 4-3: Choose to select from a list of tables and views.
The latter requires an advanced technique that we cover in Chapter 11. In most cases, you choose the option to select from a list of tables.
The Table Import Wizard reads the database and shows you a list of all available tables and views (see Figure 4-4). The tables have an icon that looks like a grid, while views have an icon that looks like a box on top of another box.
Figure 4-4: The Table Import Wizard offers a list of tables and views.
5. Select the tables and views you want to import by checking the box.
In Figure 4-4, the FactInternetSales table is selected. The Friendly Name column allows you to enter a new name that is used to reference the table in Power Pivot.
6. Click the Select Related Tables button.
Power Pivot scans for and automatically selects any other tables that have a relationship with the table(s) you’ve already selected. This is a handy feature to have when sourcing large databases with dozens of tables.
Importing tables versus importing views
Views are query objects that are built to extract subsets of data from database tables based on certain predefined conditions. Views are typically created by someone familiar with the database as a kind of canned reporting mechanism that outputs a ready-to-use dataset.
There are pros and cons to importing tables versus views.
Tables come with the benefit of defined relationships. When you import tables, Power Pivot can recognize the relationships among the tables and automatically duplicate those relationships in the Data Model. Tables are also more transparent, allowing you to see all the raw unfiltered data. However, when you import tables, you have to have some level of understanding of the database schema and how the values within the tables are utilized in the context of your organization’s business rules. In addition, importing a table imports all the columns and records, whether you need them or not. In order to keep the size of your Power Pivot Data Model manageable, this often forces you to take the extra step of explicitly filtering out the columns you don’t need.
Views are often cleaner datasets because they are already optimized to include only the columns and data that are necessary. In addition, you don’t need to have an intimate knowledge of the database schema. Someone with that knowledge has already done the work for you; joined the correct tables, applied the appropriate business rules, optimized output, and so on. What you lose with views, however, is the ability for Power Pivot to automatically recognize and build relationships within the Data Model. Also, if you don’t have the rights to open the views in design mode, you lose transparency because you won’t be able to see exactly what the view is doing to come up with its final output.
It’s generally considered a best practice to use views rather than tables whenever possible. They not only provide you with cleaner, more user-friendly data, but they can help streamline your Power Pivot Data Model by limiting the amount of data you import. That said, using tables is by no means frowned upon, and is often the only option due to the lack of database rights or availability of predefined views. You may even find yourself importing both tables and views from the same database.
Remember that importing a table imports all the columns and records for that table. This can impact the size and performance of your Power Pivot Data Model. You may find you need only a handful of the columns from the tables you import. In those cases, you can use the Preview & Filter button.
7. Click the table name to highlight it and then click the Preview & Filter button.
The Table Import Wizard displays the preview screen shown in Figure 4-5, where you see all the columns available in the table, with a sampling of rows.
Figure 4-5: The Preview & Filter screen allows you to filter out columns you don’t need.
8. Choose the columns that you want to import in the table.
Each column header has a check box next to it, indicating that the column will be imported with the table. Deselecting the check box tells Power Pivot to not include that column in the Data Model.
You can also filter certain records. Click the drop-down arrow for any of the columns and choose the criteria to filter unwanted records, as shown in Figure 4-6. This works just like the standard filtering in Excel. You can select and deselect the data items in the filtered list, or if there are too many choices, you can apply broader criteria by choosing Date Filters (if you’re filtering a textual column, this is Text Filters).
9. After you’re done selecting your data and applying any needed filters, click the Finish button in the Table Import Wizard.
The import log shown in Figure 4-7 shows the progress of the import and summarizes the import actions taken after completion.
Figure 4-6: Use the drop-down arrows in each column to filter out unneeded records.
Figure 4-7: The last screen of the Table Import Wizard shows you the progress of your import actions.
10. Open the Power Pivot window and click the Diagram View button on the Home tab.
The final step in loading data from SQL Server is to review and create any needed relationships. Power Pivot displays the diagram screen (see Figure 4-8), where you can view and edit relationships as needed. Chapter 3 covers relationships.
Don’t panic if you feel like you’ve botched the column and record filtering on your imported Power Pivot table. Select the worrisome table in the Power Pivot window and open the Edit Table Properties dialog box (choose Design → Table Properties). Notice this dialog box is basically the same Preview & Filter screen in the Import Table Wizard (refer to Figure 4-5). From here, you can select columns you originally filtered, edit record filters, clear filters, or even use a different table or view.
Figure 4-8: Be sure to review and create any needed relationships.
Loading data from Microsoft Access databases
Because Microsoft Access has traditionally been available with the Microsoft Office suite of applications, Access databases have long been used by organizations to store and manage mission-critical departmental data. Walk into any organization, and you will likely find several Access databases that contain useful data.
Unlike SQL Server databases, Microsoft Access databases are typically found on local desktops and directories. This means you can typically import data from Access without the help of your IT department. Follow these steps to do so:
1. Open the Power Pivot window and click the From Other Sources button on the Home tab.
2. In the Table Import Wizard, shown in Figure 4-9, select the Microsoft Access option and click the Next button.
Figure 4-9: Select Microsoft Access to import data from Access.
3. Fill in the information the wizard needs to connect to your database (see Figure 4-10).
In this screen, you need to provide
· Friendly Connection Name: This field allows you to specify your own name for the external source. Enter a name that is descriptive and easy to read.
· Database Name: Enter the full path of your target Access database. You can click the Browse button to search for and select the database you want to pull from.
· Log on to the Database: Most Access databases aren’t password-protected. But if you're connecting to one that does require a username and password, enter your login credentials.
· Save My Password: Select the Save My Password check box if you want your username and password to be stored in the workbook. This allows your connections to remain refreshable when being used by other people. Keep in mind that anyone can view the connection properties and see your username and password.
Because Access databases are essentially desktop files (.mdb or .accdb), they are susceptible to being moved, renamed, or deleted. Be aware that the connections in your workbook are hard-coded, so if you do move, rename, or delete your Access database, you will no longer be able to connect to it.
Figure 4-10: Provide the basic information needed to connect to the target database.
4. Click the Next button to continue with the Table Import Wizard.
From this point, the process is virtually identical to importing SQL Server data (refer to the previous section).
Loading data from other relational database systems
Whether your data lives in Oracle, Dbase, or MySQL, you can load data from virtually any relational database system. As long as you have the appropriate database drivers installed, you have a way to connect Power Pivot to your data.
Open the Power Pivot window and click the From Other Sources button on the Home tab to open the Table Import Wizard dialog box shown in Figure 4-11. Then select the appropriate relational database system that you're using. For example, if you need to import data from Oracle, select Oracle. If you need to import data from Sybase, select Sybase.
Figure 4-11: Activate the Table Import Wizard and select your target relational database system.
Connecting to any of these relational systems takes you through roughly the same steps you take when importing SQL Server data. You may see some alternate dialog boxes based on the needs of the database system you select.
Understandably, Microsoft cannot possibly create a named connection option for every database system out there, so you may not find your database system listed. In that case, select the Others (OLEDB/ODBC) option. Selecting this option opens the Table Import Wizard, starting with a screen asking you to enter or paste the connection string for your database system (see Figure 4-12). Ask your IT department for this connection string if you don't have it.
Figure 4-12: Enter the connection string for your database system.
If you’re having trouble finding the correct syntax for your connection string, follow these steps to create the string:
1. Click Build to open the Data Link Properties dialog box shown in Figure 4-13.
Figure 4-13: Use the Data Link Properties to configure a custom connection string to your relational database system.
2. On the Provider tab, select the appropriate driver for your database system.
3. Enter all the information needed on the Connection, Advanced, and All tabs.
4. When you’re done, click OK to get back to the Table Import Wizard, where you see the connection string input box populated with the connection string needed to connect to your database system (see Figure 4-14).
Figure 4-14: Use the options on the Data Link Properties dialog box to automatically build the syntax for your connection string.
From this point, the process is virtually identical to importing SQL Server data.
To connect to any database system, you must have that system’s drivers installed on your PC. SQL Server and Access are Microsoft, whose drivers are virtually guaranteed to be installed on most machines you’ll encounter. The drivers for other database systems, however, need to be installed. This is typically done by the IT department, either at the time the machine is loaded with corporate software or upon demand. If you don’t see the needed drivers for your database system, contact your IT department.
Loading Data from Flat Files
The term flat file refers to a file that contains some form of tabular data without any sort of structural hierarchy or relationship between records. The most common types of flat files are Excel files and text files. A ton of important data is maintained in flat files. In this section, you discover how to import these flat file data sources into the Power Pivot Data Model.
Loading data from external Excel files
If your source data is contained within the same workbook as your data model, you can create a linked table (see Chapter 3). Linked tables have a distinct advantage over other types of imported data in that they immediately respond to changes in the source data within the workbook. If you change the data in one of the tables in the workbook, the linked table automatically changes within the Power Pivot Data Model. The real-time interactivity you get with linked tables is nice to have.
The drawback to linked tables is that the source data must be kept in the same workbook as the Power Pivot Data Model. This isn’t always possible. You’ll encounter plenty of scenarios where you need to incorporate Excel data into your analysis, but that data lives in another workbook. In those cases, you can use Power Pivot’s Table Import Wizard to connect to external Excel files. Follow these steps:
1. Open the Power Pivot window and click the From Other Sources button on the Home tab.
2. In the Table Import Wizard dialog box shown in Figure 4-15, select the Excel File option and click the Next button.
Figure 4-15: Activate the Table Import Wizard and select Excel File.
3. Fill in the information the wizard needs to connect to your target workbook. Click Next when you're done.
In this screen (see Figure 4-16), you need to provide
· Friendly Connection Name: This field allows you to specify your own name for the external source. Enter a name that is descriptive and easy to read.
· Excel File Path: Enter the full path of your target Excel workbook. You can click the Browse button to search for and select the workbook you want to pull from.
· Use First Row as Column Headers: In most cases, your Excel data will have column headers. Select the Use First Row as Column Headers check box to make sure your column headers are recognized as headers when imported.
Figure 4-16: Provide the basic information needed to connect to the target workbook.
4. In the Select Tables and Views screen (see Figure 4-17), select the worksheets you want to import.
The Friendly Name column allows you to enter a new name that will be used to reference the table in Power Pivot.
When reading from external Excel files, Power Pivot cannot identify individual table objects. As a result, you can only select entire worksheets in the Table Import Wizard (shown in Figure 4-17). Keeping this in mind, make sure you import worksheets that contain one single range of data.
5. (Optional) Click the Preview & Filter button if you need to filter unwanted columns and records.
6. Click Finish to complete the import process.
As always, be sure to review and create relationships to any other tables you’ve loaded into the Power Pivot Data Model.
Figure 4-17: Select the worksheets you want to import.
Be aware that loading external Excel data does not give you the same interactivity you get with linked tables. Just as with importing database tables, the data you bring from an external Excel file is simply a snapshot. You need to refresh the data connection to see any new data that may have been added to the external Excel file (see the section “Refreshing and Managing External Data Connections” later in this chapter).
Loading data from text files
Text files are another type of flat file used to distribute data. These files are commonly outputs from legacy systems and Web sites. Excel has always been able to consume text files. With Power Pivot, you can go further and integrate them with other data sources. Follow these steps to do so:
1. Open the Power Pivot window and click the From Other Sources button on the Home tab.
2. In the Table Import Wizard dialog box shown in Figure 4-18, select the Text File option and click the Next button.
The Table Import Wizard asks for the information it needs to connect to the target text file (see Figure 4-19).
Figure 4-18: Activate the Table Import Wizard and select Text File.
Figure 4-19: Provide the basic information needed to connect to the target text file.
3. Fill in the following information and when you're finished, click Next.
· Friendly Connection Name: This field allows you to specify your own name for the external source. Enter a name that is descriptive and easy to read.
· File Path: Enter the full path of your target text file. You can click the Browse button to search for and select the file you want to pull from.
· Column Separator: Select the character used to separate the columns in the text file. Before you can do this, you need to know how the columns in your text file are delimited. For example, a comma-delimited file has commas separating the columns. A tab-delimited file has tabs separating the columns. Click the drop-down arrow next to the Comma Separator field to see choices for the more common delimiters: Tab, Comma, Semicolon, Space, Colon, and Vertical bar.
· Use First Row as Column Headers: If your text file contains header rows, be sure to select the Use First Row as Column Headers check box to make sure the column headers are recognized as headers when imported.
After you click Next, you get an immediate preview of the data in the text file.
4. Filter any unwanted columns by removing the check next to the column names. You can also click the drop-down arrows next to each column to apply any record filters.
5. Click the Finish button to start the import process.
Upon completion, the data from your text file will be part of the Power Pivot Data Model. As always, be sure to review and create relationships to any other tables you’ve loaded into Power Pivot.
Anyone who’s worked with text files in Excel knows they are notorious for importing numbers that look like numbers, but are really coded as text. In standard Excel, Text to Columns fixes these kinds of issues. This can be a problem in Power Pivot, too. When importing text files, you’ll want to take the extra step of verifying that all columns have been imported with the correct data formatting. You can use the formatting tools found on the Power Pivot window’s Home tab to format any column in the Data Model.
Loading data from the clipboard
Power Pivot includes an interesting option for loading data straight from the clipboard — that is, pasting data you’ve copied from some other place. This option is meant to be used as a one-off technique to quickly get useful information into the Power Pivot Data Model.
As you consider this option, keep in mind that there is no real data source. It’s just you manually copying and pasting. There is no way to refresh the data or to trace back where you actually copied the data from.
Imagine you received a Word document (similar to the one shown in Figure 4-20). You like the nifty table of holidays within the document. Here's how to import the data:
·
Figure 4-20: You can copy data straight out of Microsoft Word.
1. Copy the table; then go to the Power Pivot window and click the Paste button on the Home tab.
The Paste Preview dialog box opens as shown in Figure 4-21, where you can review what exactly will be pasted.
Figure 4-21: The Paste Preview dialog box gives you a chance to see what you’re pasting.
2. Specify the name that is used to reference the table in Power Pivot and specify if the first row is a header.
3. Click OK to import the pasted data into Power Pivot.
At this point, you can adjust the data formatting and create the needed relationships.
Loading Data from Other Data Sources
We’ve covered the data sources that are most important to a majority of Excel analysts. Still, there are a few more data sources that Power Pivot is able to connect to and load data from. We touch on some of these data sources later in this book, but others are outside the scope of this book.
Although you're probably not likely to use the following data sources, it’s worth knowing they exist and are available if you need them:
· Microsoft SQL Azure: SQL Azure is a cloud-based relational database service some companies use as an inexpensive way to get the benefits of SQL Server without taking on the full cost of hardware, software, and IT staff. Power Pivot can load data from SQL Azure in much the same way as the other relational databases we discuss in this chapter.
· Microsoft SQL Parallel Data Warehouse: SQL Parallel Data Warehouse (SQL PDW) is an appliance that partitions very large data tables into separate servers and manages query processing among them. SQL PDW is used to provide scalability and performance for big data analytics. From a Power Pivot perspective, it’s no different from connecting to any other relational database.
· Microsoft Analysis Services: This selection refers to Microsoft’s Analysis Services Online Analytical Processing product. We take a closer look at Analysis Services and using Analysis Services in Power Pivot in Chapter 13.
· Report: The curiously named report data source refers to SQL Server Reporting Services reports. In a basic sense, Reporting Services is a BI tool used to create stylized PDF-style reports from SQL Server data. In Chapter 12, you get a feel for Reporting Services as it relates to Microsoft’s suite of BI tools. In the context of Power Pivot, a Reporting Services report can be used as a Data Feed Service, providing a refreshable connection to the underlying SQL Server data.
· From Windows Azure Marketplace: Windows Azure Marketplace is an OData (Open Data Protocol) service that provides both free and paid data sources. If you register for a free Azure Marketplace account, you get instant access to governmental data, industrial market data, consumer data, and much more. You can enhance your Power Pivot analyses by loading the data from the Azure Marketplace using this connection type.
· Suggested Related Data: This data source reviews the content of your Power Pivot Data Model and, based on its findings, suggests Azure Marketplace data you may be interested in.
· Other Feeds: The Other Feeds data source allows you to import data from OData Web services into Power Pivot. OData connections are facilitated by XML Atom files. Point the OData connection to the URL of the .atomsvcs file, and you essentially have a connection to the published Web service.
Refreshing and Managing External Data Connections
When you load data from an external data source into Power Pivot, you essentially create a static snapshot of that data source at the time of creation. Power Pivot uses that static snapshot in its internal Data Model.
The external data source may change and grow as new records are added. However, Power Pivot is still using its snapshot so it can’t incorporate any of the changes in your date source until you take another snapshot.
The action of updating the Power Pivot Data Model by taking another snapshot of your data source is called refreshing your data. You can refresh manually, or you can set up an automatic refresh.
Manually refreshing your Power Pivot data
On the Home tab of the Power Pivot window, click the drop-down arrow on the Refresh button and choose one of these options:
· Refresh: Use the Refresh option to refresh the Power Pivot table that’s currently active. For example, if you are on the Dim_Products tab in Power Pivot, clicking Refresh reaches out to the external SQL Server and requests an update for just the Dim_Products table. This works nicely when you need to strategically refresh only certain data sources.
· Refresh All: Use the Refresh All option to refresh all the tables in the Power Pivot Data Model.
Setting up automatic refreshing
You can configure your data sources to automatically pull the latest data and refresh Power Pivot. Here's how:
1. Click the Data tab on the Ribbon and click the Connections button.
The Workbook Connections dialog box opens.
2. Select the data connection you want to work with and then click the Properties button.
3. With the Connection Properties dialog box open, click the Usage tab.
You’ll find an option to refresh the chosen data connection every X minutes and an option to refresh the data connection when the Excel work is opened (see Figure 4-22).
Figure 4-22: The Connection Properties dialog box lets you configure the chosen data connection to refresh automatically.
4. Choose a refresh rate:
· Refresh Every X Minutes: Refresh the chosen data connection every specified number of minutes. This refreshes all tables associated with that connection.
· Refresh Data When Opening the File: Automatically refresh the chosen data connection when the workbook is opened. This refreshes all tables associated with that connection as soon as the workbook is opened.
5. Click OK when you're done.
Preventing Refresh All
In addition to the Refresh All command on the Ribbon, there are actually two more places you can refresh your data in Excel 2013: the Ribbon's Data tab and the PivotTable's Analyze tab.
Clicking any Refresh All button anywhere in Excel essentially completely reloads Power Pivot, refreshes all PivotTables, and updates all workbook data connections. If your Power Pivot Data Model imports millions of lines of data from an external data source, you may want to avoid the processing hit taken when clicking Refresh All.
In the Connection Properties dialog box (refer to Figure 4-22), deselect the Refresh This Connection on Refresh All check box.
Editing your data connection
There may be times when you need to edit your source data connection after you’ve already created it. Unlike refreshing, where you simply take another snapshot of the same data source, editing the source data connection allows you to go back and reconfigure the connection itself. Here are a few reasons you may need to edit your data connection:
· The location or server or data source file has changed.
· The name of the server or data source file has changed.
· You need to edit your login credentials or authentication mode.
· You need to add tables you left out during initial import.
In the Power Pivot window, click the Home tab and click the Existing Connections button. The Existing Connections dialog box shown in Figure 4-23 opens. Your Power Pivot connections are under the Power Pivot Data Connections subheading. Select the data connection that you need to edit.
Figure 4-23: Use the Existing Connections dialog box to reconfigure your Power Pivot source data connections.
After you select your target data connection, click either the Edit or Open button, depending on what you need to change:
· Edit: Lets you reconfigure the server address, file path, and authentication settings.
· Open: Lets you import a new table from the existing connection. This is handy if you inadvertently missed a table during the initial loading of data.