Microsoft Business Intelligence Tools for Excel Analysts (2014)
PART II: Leveraging SQL for Business Intelligence
Chapter 8: Essential SQL Server Concepts
Chapter 9: Introduction to SQL
Chapter 10: Creating and Managing SQL Scripts
Chapter 11: Calling Views and Stored Procedures from Excel
Chapter 12: Understanding Reporting Services
Chapter 13: Browsing Analysis Services OLAP Cubes with Excel
Chapter 14: Using the Data Mining Add-In for Microsoft Office
Chapter 8: Essential SQL Server Concepts
In This Chapter
· Understanding SQL Server
· Setting up databases
· Working with tables and views
· Understanding SQL Server security
· Importing and exporting data
SQL Server is an enterprise-grade database platform designed to handle large datasets and intensive data processing. Microsoft has positioned SQL Server at the center of its BI Stack, enabling it with powerful features and add-ons specifically designed for business intelligence and analytics. It is critical for you to be familiar with SQL Server if you deal with datasets that are too large for standard desktop tools to handle. The data stored inside SQL Server can be integrated in a variety of front-end tools such as Excel, Access, and advanced reporting and dashboarding tools.
SQL Server Components
SQL Server is an extensive platform with dozens of components. In this section we focus on two of those components:
· SQL Server Relational Database Engine
· SQL Server Management Studio
SQL Server Relational Database Engine
The SQL Server Relational Database Engine is a stand-alone service that can be installed on a personal computer or a dedicated server or cluster of servers. The specifications of the computer hardware required to run this service depend on the performance requirements. As your dataset size and your data processing requirements grow, your hardware requirements grow accordingly.
Start your development on your desktop and migrate to a larger machine for your production environment.
There are many benefits to having a dedicated environment for running the SQL Server Relational Database Engine, including:
· Controlling access to the server
· Protecting your data
· Allowing concurrent access
· Improving performance
· Making administration easier
SQL Server Management Studio
After the SQL Server Relational Database Engine component is installed, it stays mostly behind the scenes and interactions to that component occur via SQL Server Management Studio.
As shown in Figure 8-1, SQL Server Management Studio is a client tool that can be installed on your computer to enable you to perform several functions on the database service, including:
· Connecting to a database service
· Creating and managing databases
· Configuring security
· Creating tables, stored procedures, and views
· Developing SQL queries
· Creating indexes and performance tuning
· Importing and exporting data
SQL Server Management Studio allows you to interact with your SQL Server database. With Management Studio, you can manage tables, create queries, pass commands to the database, and receive returned query results.
Figure 8-1: SQL Server Management Studio.
There are two views in SQL Server Management Studio that you will primarily use:
· The Object Explorer allows you to browse objects that have been created on the service, such as databases, tables, and other objects.
· The Query window allows you to write and execute SQL. The Query window returns data and messages from the service based on the SQL that you execute.
Structured Query Language (SQL) is the primary language for interacting with your SQL Server service (we cover the basics of SQL in Chapter 9). You can use SQL to perform administrative functions on the service, create objects, or query and manipulate data. Performing all these tasks via SQL gets pretty challenging. Luckily, you do not have to memorize all the SQL functions to be able to use SQL Server. Microsoft has built user interfaces for performing most of the functions that you may need to build your data structure, administer your databases, and even perform some data analysis.
You do need to learn SQL basics to get the most out of your installation. The good news is that the learning curve for SQL is not very steep.
Connecting to a Database Service
Microsoft has made it very easy to connect to an SQL Server Relational Database Engine service. Just follow these steps:
1. Open SQL Server Management Studio.
2. Click Connect in the Object Explorer window.
3. Select Database Engine from the Server Type drop-down menu.
The Connect to Server dialog box shown in Figure 8-2 opens.
Figure 8-2: Connecting to an SQL Server Relational Database Engine service.
4. Choose the appropriate server name and authentication method.
If you don’t know the authentication method to use, contact the administrator.
5. Click Connect.
Now that you have connected to the service, you can expand the databases folders in the Object Explorer window to see if there are any databases created on the service. By default there aren’t any.
SQL Server Security
SQL Server security is quite extensive and complex to master. The full scope of security capabilities are beyond the scope of this book, but we’ll tell you how to get up and running with your analytics system. At a very high level, there are three levels of access that you should be concerned with: Server, Database, and Database Objects.
Server access security has two aspects: login and server roles (see Figure 8-3).
Figure 8-3: Logins and server roles.
Login roles are the named users or groups that are allowed to access to the server. Groups are beneficial, instead of individual logins, because group logins allow you to manage everyone in that group in your organization’s Active Directory. This keeps security centralized as these groups may have access to other applications with related functions.
Add a login by right-clicking the Logins folder and choosing New Login. Work with your corporate security administrators to make sure you're following your organization’s security policies and procedures.
The sa login is the administrative account, which has full access rights to the server. This login is defined during the installation. If you're the one installing SQL Server, make sure to assign a secure password to this account, otherwise you compromise the security of your system.
Server roles provide overarching permissions to take server-level actions, such as creating a new database, and managing server backups and linked objects.
Every login in SQL Server must have at least one associated server role. You can set the login’s roles by right-clicking each login in the Logins folder and selecting Properties. In the Login Properties dialog box, click Server Roles to see the available server roles (see Figure 8-4).
Figure 8-4: Three primary server roles.
A login can be assigned to any number of the available server roles. Each role has a defined set of permissions governing the actions that can be taken on the server. The most commonly used roles are Dbcreator, Public, and Sysadmin:
· Dbcreator: Is able to create new databases on the server and fully control and administer those databases.
· Public: Is allowed access to the server but is not given any server-level capabilities. Typically logins under this role have specific access at the database level.
· Sysadmin: Has full control over the server. You are typically assigned this role if you installed SQL Server yourself or you were given that level of access by another sysadmin.
After your server-level security has been defined, there is another layer of access control at the database level. You can configure and control this access level in the Security folder of your database in the Object Explorer window. There are two primary aspects to SQL Server’s database-level security that are of concern to you: users and database roles (see Figure 8-5).
Figure 8-5: Users and database roles.
Add a database user by right-clicking the Users folder in the Object Explorer window and selecting New User. The Database User - New window opens, where you can add a new database user. Every database user must have a server login. To simplify security admin, give the database user the same name as the server login.
After naming the user and assigning the proper server login, you have to define the user’s database access level. There are two primary approaching to defining access level:
· Database Role: Gives the user general database access, making it easier for you to administer security because you do not have to specifically declare access for each object in your database.
· Database Object Access: Allows you fine control over your database security. This is discussed in detail in the next section.
There are three primary database roles that are important to you (see Figure 8-6):
· Db_datareader: Is able to read data from all the database tables that are created by other users in the database. Use this role if you want simple security access and you do not have sensitive data.
· Db_datawriter: Is allowed to edit (insert, update, and delete) data in all the database tables created and populated by other users. This access is typically given to your team members who would be helping with data analysis, and not to the general public. If you give everyone this level of access, you risk losing control of your data and not being able to guarantee the integrity of your system.
· Db_owner: Has full control over the database and all the objects within it. You should avoid giving this level of access to other users unless they are trusted members of your team with functions that require it.
Figure 8-6: Three database roles.
Database users do not have to be assigned any database roles. If you want granular control over a user’s access, grant him specific control over the various database objects.
Database object access
SQL Server allows for complete control over your security. If you do not want to give general access at the database level, you can control access on an object-by-object level for each of your users. You can do so on the Securables tab in the Database User window (see Figure 8-7).
Figure 8-7: Database user securables.
1. Click the target database you're working with. From Object Explorer, click Security and then Users. Then right-click the target user and choose Properties.
2. Click the Search button.
The Add Objects dialog box opens, as shown in Figure 8-8.
Figure 8-8: Add Objects options.
3. Select the All Objects of the Types option and then click OK.
The Select Object Types window opens, with a large number of database objects to choose from (see Figure 8-9).
Figure 8-9: Select Object Types window.
4. Select the Tables and Views object types and click OK.
In most situations, those are the two object types you grant access to.
Now in the Database User window, the Securables and Permissions areas are populated with objects and permission options, respectively (see Figure 8-10).
Figure 8-10: Database User window with populated securables.
When you click a specific object in the Securables area (refer to Figure 8-10), you see the specific permissions that you can control for that object in the Permissions area. For tables and views, you should be aware of four primary permission types:
· Delete: Allows the user to remove data from a table.
· Insert: Allows the user access to append data to the table, either directly with SQL or by using an application.
· Select: Gives the user access to read the data by writing SQL against the table or via a reporting or dashboarding tool.
· Update: Is a combination of a Delete and an Insert on existing data in a table.
Typically you allow Select access to most clients. Use Insert, Delete, and Update for team members assisting you with creating analytics. The one exception to this rule is a lookup table that you want clients to maintain. Lookup tables typically contain business rules or attributes that clients control; for example, a lookup table to maintain unit cost.
Working with Databases
Creating a database is typically done by your system administrator. If you have to perform this function yourself, you need to know some basics about how SQL Server works on a server or your personal computer. The first thing you need to think about is disk configuration. The recommended best practice for SQL Server installation is to segregate the various types of files and databases into separate drives. This allows you to better control the performance and administration of your server. There are four functional areas you need to be concerned with:
· Data files: As the name implies, these files store the data loaded in your user tables. The size of these files depends on the amount of data and type of data you store in your tables. The larger the number of rows and columns, the larger the files.
· Log files: These files are used by SQL Server to guarantee data integrity while SQL Server performs the work necessary when SQL statements are submitted by the users. Their size depends on the database setup (we discuss this in a later section) and the transactions submitted.
· Temp DB: This database is used by SQL Server as a temporary work space for processing user SQL statements. Its use may be explicit by the user or implicit by the SQL Server optimizer. The files of this database can be very large on a server with a high level of activity.
· Backup files: It is recommended that you back up your SQL Server database on a regular basis to allow you to recover data in case it becomes corrupted. The size of these files depends on the databases being backed up, the frequency of backup, and the number of backups saved.
If you're using a server for your installation, your system administrator should configure SQL Server to automatically point the different files to the appropriate drive when you create a new database. Otherwise, you have to do that configuration yourself.
Creating a database
To create a new database, right-click the Databases folder in the Object Explorer window and select New Database. The New Database window opens, allowing you to define the database (see Figure 8-11).
Figure 8-11: New Database window.
There are several options and configurations available to you when creating a database:
· Database name: Give careful thought to selecting your database name. You can change it; however, it might impact queries and applications that reference that name.
· Owner: In general, leave this field selected to Default. By default, SQL Server assigns all objects in the database to DBO. For more complex databases, you might consider setting multiple owners in your database to organize objects under them.
· Database files: Refer to the previous section for best practices on setting up these files.
· Options: Clicking Options in the Select a Page pane. That takes you to the Options tab in the New Database window (see Figure 8-12). Select Simple from the Recovery Model drop-down list, as shown in Figure 8-12. This option keeps your log file under control.
Figure 8-12: Select Simple from the Recovery Model drop-down list.
SQL Server database maintenance involves several aspects, including backup, index rebuilding, and statistics updates. The full scope of database maintenance is traditionally handled by the system administrator. If you're managing the system yourself, it's helpful to know how to back up a database. Follow these steps for the easiest way to back up your database:
1. Right-click your database, and choose Tasks → Back Up.
The Back Up Database window opens, which allows you to select the options for the backup you want to perform (see Figure 8-13).
Figure 8-13: The Back Up Database window.
2. In the Name box, enter an appropriate name for the backup.
Usually it is best to include the date and time as part of the name you give to your backup. This allows you to easily recognize when the backup was taken.
3. Select a location for the backup.
If your system has multiple drives, make sure you select the drive that is designated for backup.
4. Click the Options page on the left margin of the Back Up Database window and then select the Overwrite All Existing Backup Sets option (see Figure 8-14).
5. Click OK.
Backing up databases allows you to recover data in case data becomes corrupted. Data corruption could occur due to a human error or in the very unlikely event of SQL Server system corruption. To restore a database, right-click the database, choose Tasks → Restore → Database. Follow the instructions to restore the database from the appropriate backup — which is usually the last backup that included valid data.
Figure 8-14: Configure backup options.
Working with Tables and Views
Tables represent the core of your analytics system. This is where all your data is stored to be used for generating reports and value-adding analysis. Views, as the name implies, don’t hold any data, but instead reflect views of the data contained in the underlying tables. Views give you the ability to join several tables together into one dataset.
There are several design considerations that should be taken into account when building tables and views:
· Data model: The data model refers to the organization of your data elements into tables and columns. The approach you should take for the design depends on the functional area of the database. Typically, analytics data models follow what is referred to as a denormalized model, which focuses on making reporting and analysis efficient as opposed to optimizing data inserts and updates.
· Data size: It is important to set the data types of sizes of your columns properly. Each column should be set according to the data you expect to load in it, such as integers or characters. For characters and decimals, you need to specify the size of the data.
Be aware of the number of rows you're loading into each table. Managing hundreds of millions of rows requires a lot more care than managing one or two million.
· Data relationships: If you're storing data in multiple tables, you need to carefully plan how to join the tables together. It is recommended that you design your joins to be based on columns that are integers or smaller.
Avoid joining on character-based columns, as it may lead to slower performance.
· Data update: A big consideration in your system is your approach to data updates. As you acquire new data from source systems or third parties, you need to update that data into your system. That may involve appending existing data or doing a complete reload, depending on the strategy you've chosen. Following an append strategy is more complicated, but it leads to faster update performance and helps you avoid changing historical results.
· Reporting and analysis requirements: This is the main driver behind your analytics system and should be your starting point of the design. Everything you do in your system should be based on the outcomes you are seeking. Spend adequate time thinking through those requirements and make sure your system can meet them.
Creating a table
To create a table in SQL Server, follow these steps:
1. Right-click the Tables folder in the Object Explorer window and choose New Table.
The table view appears in the Query window, as shown in Figure 8-15.
2. Enter your column name, choose a data type, and specify whether you want to allow nulls in your column.
Nulls in SQL Server are handled in a special way. You cannot join on null columns; if you're looking for nulls, use the Is Null clause in your query.
3. Click the X in the top-right corner of the Query window.
A message box opens asking if you want to save the table.
4. Click Yes, enter the table name, and your table is created.
Figure 8-15: A table view.
Figure 8-15 shows a table called sample that includes two columns. To see the data and structure of the table, right-click its name in the Object Explorer window and choose Select Top 1000 Rows. A Query window appears (see Figure 8-16).
The top of the New Query window includes the SQL query that returns the top 1,000 rows in the sample table. No data is loaded in the table, so the bottom part of the window shows the table columns but returns no data rows.
Figure 8-16: A sample table.
Creating a view
To create a view in SQL Server, follow these steps:
1. Right-click the Views folder in the Object Explorer window and select New View.
The Add Tables dialog box opens, as shown in Figure 8-17, so you can add tables to your view.
We're using the AdventureWorksDW database for the purposes of these steps.
2. Select the tables you want to add to your view and click the Add button.
In this case we selected the DimCustomer, DimDate, DimGeography, and FactInternetSales tables. SQL Server automatically brings the selected tables in the relationship window and shows the resulting SQL below it.
3. Click the columns that you want to add to your view.
Figure 8-17: View design window.
The columns are automatically added to the SQL statement (see Figure 8-18).
4. After you have finished selecting all the columns and tables you want to add, click the Save icon on the menu bar.
5. Enter the view name.
Make sure to follow a standard naming convention to make it easier to maintain the system. Here are some common conventions for naming objects in SQL server:
· Limit the name to approximately 50 characters (the shorter the name, the better).
· Avoid using spaces in the name. Instead, use camel case (using uppercase for the first letter of each word, as in SalesAmount).
· Avoid using numbers and underscores in the name.
Figure 8-18: A completed view.
So far, you've created the objects you need in your system. In Chapter 9, you discover how to harness the full power of SQL Server with SQL. However, you can get a taste of SQL by right-clicking the view that you created, hovering your mouse over Script Views As, and then choosing Create To → New Query Editor Window. A New Query Editor window opens, as shown in Figure 8-19.
Management Studio automatically creates the syntax needed to replicate the creation of this view. Although you may not fully understand the syntax, it’s relatively easy to see what the code is doing. As you become more familiar with SQL, you can leverage Management Studio’s ability to create scripts as a starting point for when you want to create your own custom SQL scripts.
Figure 8-19: The script view output.
Data Importing and Exporting
Most analytics systems include some sort of external data. That data could be in the form of flat file feeds from source systems, or it may be a replicated copy of a source system database. You should carefully consider the data integration needs of your analytics system and work with your system administrators to secure your data feeds.
An important and useful function of SQL Server is its ability to support data integration through SQL Server Integration Services (SSIS). SSIS is a sophisticated and market-leading Extract Transform and Load (ETL) tool. SSIS includes advanced features that allow you to manage complex data integration requirements. In this section we cover some simple functionality to enable you to easily import and export data from and to your SQL Server database. This is a simplified SSIS utility that is accessible from the Microsoft SQL Server Management Studio and includes the ability to connect to multiple types of sources and targets.
Importing and exporting data are functionally the same, with the only difference being the source and target. Taking data out of your database to another target is called exporting and bringing data in is called importing.
To import data into your SQL Server database follow these steps:
1. Right-click the database in the Object Explorer window, hover your mouse over Tasks, and select Import Data.
The SQL Server Import and Export Wizard starts.
2. Click Next on the first screen and the Choose a Data Source screen opens, as shown in Figure 8-20.
You can choose a variety of sources, including flat files, Excel files, ODBC, SQL Server, or other relational databases. We're copying a table from the AdventureWorksDW SQL Server database.
Figure 8-20: Choosing a data source.
3. Choose Microsoft OLE DB Provider for SQL Server as your data source and select the server name and source database. Click Next.
The wizard presents a new screen where you can choose a destination.
4. Select the right target database, and click Next.
The Specify Table or Query screen opens, as shown in Figure 8-21. You have two options. You can copy data from other tables or views, or you can write your own query to specify the dataset to transfer over. Use the latter when you want to copy a few columns that are spread over multiple tables in the source system.
Figure 8-21: Specify Table Copy or Query.
5. Select the Copy Data from One or More Tables or Views option and click Next.
The Select Source Tables Views screen opens, with a list of tables and views that exist in your source database (see Figure 8-22).
6. Select one or more of those objects and click Next.
7. Choose whether to run the data transfer immediately or save the transfer steps as a package that can be run on demand. Click Next.
Figure 8-22: Select Source Tables and Views.
8. Click the Finish button.
If you chose to run the data transfer immediately, Management Studio starts the data transfer and displays a progress indicator showing the number of records transferred. If you chose to save the transfer steps as a package, Management Studio creates the package file.