Microsoft Business Intelligence Tools for Excel Analysts (2014)
PART II: Leveraging SQL for Business Intelligence
Chapter 10: Creating and Managing SQL Scripts
In This Chapter
· Design concepts
· Working with SQL scripts
· Indexing and performance considerations
· SQL solutions to common analytics problems
Analytics systems must procure data from various sources, often in formats that are not necessarily suited for analytics. These data feeds have to be integrated and manipulated to make the output suitable for analytics consumption. In this chapter we go over key concepts and tools to guide you in building an analytics system that is scalable and maintainable.
Design Concepts
Building a successful analytics system requires careful design that stems from experience and lessons learned from prior mistakes. Luckily the industry has now a couple of decades of experience in building these systems. This had led to established best practices to help you avoid the mistakes of the past. In this section we summarize these best practices and give you a foundation to start off on the right foot.
The concepts in this section become important as your system complexity and data volumes grow. When you start your design, you have to assess the scope of your system and decide the level of design sophistication that is necessary to meet your needs.
Stay organized
The most common mistake beginners make when building an analytics system is not being organized. Preparing your data for analytics should be divided into multiple stages. Each stage performs a special purpose in the data preparation process leading up to the final output.
To illustrate the concept, consider the example of a restaurant. You would not put the kitchen equipment where the food preparation occurs in the dining room where customers eat, would you? You want your customers to enjoy their meal without the clutter of the kitchen operations. Inside the kitchen you would separate the area by function to improve efficiency. The same concept is true for your analytics system.
You want to build an analytics system that is easy to reuse. Typically that involves building the data structure and updating the data within it on a regular basis. The data inside that structure can be used to build reoccurring reports or to run ad hoc queries. Once built, these systems are not completely static in terms of their structure and must continuously evolve to keep up with the business changes.
We recommend separating the tables and scripts into various stages and delineating those stages using a specific naming convention. For a simple system, organize your system in three stages:
· Data extraction: This stage includes the tables, files, and scripts that are involved in extracting data from the various source systems. Keep this stage simple and light on business logic. For most of your extract, you can rely on a change data capture (CDC) date that outlines the changed data in your system. Otherwise you always have to pull all source data upon every extract, which makes your system slow and degrades performance over time as data sizes grow.
· Data preparation: This stage includes the tables and scripts that are involved in preparing the data to be inserted in the final output. Most of the system’s business logic is in this stage. The final step in this stage is to prepare the incremental data set to be loaded into each table in the final output. So, you should have a mirror table on each Final Output table. Think of this as plating the dishes and setting them out to be picked up by the waiters and delivered to the customers. It may seem like a redundant step to stage your data in a mirror table to your final destination, but there are good reasons for doing so, including data validation, data integrity, consistency, and reusability.
· Data delivery: This stage includes the tables and scripts that are involved in the final delivery of the data in the analytics system. The delivery steps typically involve updating existing records and inserting new ones. If you completely reload your system every time, you would obviously not have any records to update. We recommend you follow an incremental update strategy to maintain system scalability.
Complex systems require additional categories and phases and more sophisticated processes compared to what we have just outlined. As you feel more comfortable with your analytics system, you can expand on this foundation.
Following a logical organization when building your system can make a big difference in the long run and help keep it maintainable and efficient.
Move data in one direction
The data coming through your analytics system should move from the source systems on down the various phases that you have outlined (see Figure 10-1).
Figure 10-1: Move data in one direction.
Following this principle allows you to ensure data integrity every time you run your script and makes maintenance and troubleshooting easier.
You should also set up your script to run all the components of every phase together. This keeps your design simple and allows you to avoid building tables with incomplete data. Think of each circle in Figure 10-1 as a gateway or checkpoint. The data stops after each of those points and waits for the rest of the data points in that phase to come in. After that is done, you move on to the next phase.
Divide data according to metrics and attributes
When you first build an analytics system, you may be tempted to organize all your data into one large table. At first glance, this may seem like the way to go, but as you get deeper into the process you’ll find that this approach has two main limitations:
· Bad performance: A large table approach leads to bad performance. This is due to several reasons, including unnecessarily repeating data, having to redundantly update that data when changes occur, and increasing the size of data on disk. Performance problems may not show up on day one, but they will creep up as your data volumes increase.
· Lack of flexibility: Advanced analytics often require multiple metrics that are not always at the same granularity; for example, order count versus order item count. When you lump all the metrics together, you limit your analytics capabilities because you are artificially fitting metrics into one structure.
A more optimal approach is to organize your metrics and attributes separately according to granularity and relationships. Here are two high-level guidelines to follow:
· Separate attributes from metrics. Attributes are generally strings such as names or descriptions. There is no need for them to be lumped in with metrics. You can put them in their own tables and reference a key relationship to the table that contains your metrics. Figure 10-2 contains a flat reporting table. It’s a single table that has the product information, the customer details, and the orders. Figure 10-3 organizes the data into three tables according to relationships. Consider that the Product Name has a misspelling in it and needs to be corrected. You’d have to update every row in the flat reporting table for the orders that have that product as opposed to one row in the Product table. The impact on performance is massive between the two operations.
Figure 10-2: A flat reporting table.
Figure 10-3: Tables organized according to metrics and attributes.
· Organize metrics according to granularity. Not all metrics have the same granularity. When you’re working through your design, consider the various metrics you need in your analytics and take the time to organize them properly. An order can have many order items. If you go the route of a single reporting table (refer to Figure 10-2), you have two choices: Expand the grain of the table down to Order Item, or keep the table as is and add Order Item Count to it. Both options have disadvantages. The first option forces you to allocate Order Count among all the order items, and the second option forces you to lose the ability to see the order items details. The recommended approach to this problem is to break out order item into its own table, as shown in Figure 10-4.
Consider data volumes up front
A common pitfall when designing analytics systems is ignoring future data growth from the start. You must consider the long term when designing your system, because it will typically be around for many years. The primary principles to consider when handling large data volumes are
· Follow an incremental update strategy: If you want your system to perform well, avoid redundant work. If you run the system today and update all the data up to this point, you don't want to repeat that step the next day if you can avoid it. Updating the system incrementally involves only appending new data that came in since your last update. If done properly, this should speed up data loads and allow you to handle increased data volumes during a short processing window.
Figure 10-4: Adding a Customer Order Item table allows for the addition of granularity to the data.
· Properly index your tables: A key consideration in analytics is query performance. You may have reports that are hitting your system live. You do not want those reports to perform poorly, because that would lead to a bad user experience. Indexing is the most effective tool you have in SQL for speeding up queries running against large tables. We review indexing concepts in a later section of this chapter.
· Keep your tables narrow: SQL Server (and relational databases in general) store each row in your table in one location on disk. Therefore, the larger the row, the longer it takes for SQL Server to read or update it. For this reason, keep the tables that have many rows in them narrow. This enables SQL Server to read and write data from these tables much faster and leads to better system performance.
Consider full data reload requirements
Another important factor to consider in your design is the frequency and need for full reloads. Full reloads are not common, but they do occur. Your objective is to minimize them, but if you have a business or technical requirement to support a full reload, your design needs to able to handle such a request. Two primary concerns are related to this:
· Full reload performance: You can design a system that performs relatively quickly when you’re doing incremental updates, but you may find that it’s slow when it comes to full reloads. This may be an issue if you have a limited window to perform full reloads. There are several factors to consider when performing a full reload, but the primary one is indexing. Loading a table with multiple indexes is slow in a relational database. The most optimal approach is to drop and re-create your indexes before and after a full reload.
· Source system data changes: The key question here is, will your system produce the same data after a full reload? That may or may not be the case, depending on the behavior of your source system. Keep in mind that every source system has some data that is overwritten. You have to check your data elements and identify those attributes that cannot be overwritten and avoid doing full reloads on those attributes.
It is not uncommon for analytics systems to have two different scripts — one for incremental load and one for full reload. Take these issues into consideration in your design to avoid painful surprises.
Set up logging and data validation
To make your analytics system robust, you must add some logging and validation capabilities to it. We go over some simple approaches related to these concepts in this section. Think of this as an auditing and quality assurance function.
· Logging: The purpose of logging is to keep track of your system runs. There are several components to logging in your analytics system. Keep a simple table of your system runs and add initial and modified dates on your tables. Logging can help you look at your system and understand:
· When a run occurred
· How long it took
· What it updated
· Validation: A key factor in the success of your analytics system is reliability. If you have reports going to various users, including top management, the last thing you want to do is to send them bad numbers. It is in your best interest to do everything you can to increase the reliability of your output. Adding validation steps can help you catch mistakes before they go out for general consumption. Validation involves adding queries to check for the reasonableness of your output. That could take several forms, including:
· Comparing a value against a running average
· Checking a summary of update counts manually or automatically
· Trapping and flagging bad data (data that falls outside a certain range or that violates a certain rule)
Working with SQL Scripts
SQL scripts are automated procedures you develop to tell SQL Server to perform changes to your system. They can be saved in files or stored on the server in the form of stored procedures. These scripts can then be executed on demand or according to a schedule.
We give you an overview of SQL scripting in this section, including outlining scripting patterns that are most commonly used in analytics systems.
The data structure we use in this section involves three SQL Server databases — Orders, Billing, and RevenueAnalytics. Orders and Billing are two source system databases, and RevenueAnalytics is the analytics system. The tables from the source system include Customer, Order, and Product from the Orders system and Payment from Billing, as shown in Figure 10-5. Here is how the three tables work together:
· Customer is related to Order via CustomerKey and Product via ProductKey.
· Customer and Product are the parents of Order.
· The keys to the three tables from the Orders system are also transferred over the ReoccurringPayment table in the Billing system.
Figure 10-5: Source systems data model.
This example is based on a business that sells products with reoccurring monthly billing — one order results in multiple billings for each customer. The desired output of this system is to produce analytics on reoccurring revenue, including:
· Pivoting the data by OrderDate and showing a crosstab of those dates by payment period (defined as the days’ difference between OrderDate and PaymentDate divided by 30).
· Showing a running cumulative payment amount.
Looking at these requirements, you can outline a simple model for the Data Delivery phase that allows you to produce the required outputs. The model is outlined in Figure 10-6.
Deliver.PaymentPeriod and Deliver.OrderDate do not have load and modify dates because they are static tables loaded one time from Excel or a flat file.
Data extraction scripting
The data extraction phase should be kept simple and pull in data changes only. All objects in this section should be stored under the same schema in the database. The name of this schema is Extract. Figure 10-7 outlines the tables.
Figure 10-6: Data Delivery tables.
Figure 10-7: Extract tables.
The structure of this phase mirrors the source system almost exactly. The only exception is Extract.ReoccurringPayment. The OrderDate is needed because you need to use that column to produce the two analytics requirements outlined in the example definition.
Listing 10-1 shows the data extraction script. It starts by defining a From and To incremental window of time for updating your system. This window must be maintained in your logging table and updated at the end of each run. Notice that the target table is truncated prior to insert.
You can make this step more robust by adding a unique constraint on the Extract.Customer table if you are concerned about data integrity.
Listing 10-1: Data Extraction
Declare @FromDateTime as datetime,
@ToDateTime as datetime
--You must set the @FromDateTime and @ToDateTime from the logging table
Truncate Table Extract.Customer
Insert Into Extract.Customer
Select CustomerKey, CustomerFirstName, CustomerLastName, CustomerEmail
From Orders.dbo.Customer
Where ModifyDate Between @FromDateTime and @ToDateTime
Listing 10-2 shows a more complicated extract script for ReoccuringPayment. In order to produce the required analytics related to the table in this example, you need the OrderDate and the full payment history for an order. To accomplish this, you make two passes against the ReoccurringPayment table — first to get all the orders that had new or changed payments, and second to grab all of those orders payments.
Listing 10-2: A Complex Data Extraction
Declare @FromDateTime as datetime,
@ToDateTime as datetime
--You must set the @FromDateTime and @ToDateTime from the logging table
Truncate Table Extract.ReoccurringPayment
Insert Into Extract.ReoccurringPayment
Select p1.ProductKey, p1.OrderKey, o.OrderDate, o.CustomerKey,
p1.PaymentDate, p1.PaymentAmount
From Orders.dbo.Order o
Join Billing.dbo.ReoccurringPayment p1
On o.OrderKey=p.OrderKey
Join (
Select Distinct OrderKey
Join Billing.dbo.ReoccurringPayment
Where ModifyDate Between @FromDateTime and @ToDateTime
) p2
On p1.OrderKey=p2.OrderKey
This example makes the extract script more complex in order to handle the required business rules. An alternative approach would be to keep the extract layer simpler, but expand the prepare layer or add a new layer to the system to keep a permanent copy of the source data that you can use to perform the required calculations. In this case, that approach works well because the source systems are replicated to the analytics server and there is no risk of overloading the transactional system.
Data preparation scripting
The objective of the data preparation phase is to set up all the scripts that are necessary to perform the core business functionality of the system and land the desired data into staging tables ready for final merge into the Data Delivery phase tables. The objects in this phase are organized in the Prepare database schema. The tables required mimic the Data Delivery tables, but may also include some intermediate or temporary tables depending on what is necessary to get all the transformations done. Not all the tables in the Data Delivery phase require a step in this phase. Target tables that look very similar to the Extract tables can skip this step or views can be used to put in a logical placeholder for those objects. Figure 10-8 shows the model for this phase. (The Prepare.Customer and Prepare.Product are not included because the target is exactly like the Extract tables; they're for informational purposes only.)
Figure 10-8: Prepare tables.
There is no typical script for this step because the structure depends on the work required. There are two special business rules required for this example — calculating the PaymentPeriod and and the CumulativePaymentAmount. Listing 10-3 shows the PaymentPeriod calculation. The CumulativePaymentAmount is illustrated in a later section in this chapter.
Listing 10-3: Data Preparation
Truncate Table Prepare.ReoccurringPayment
Insert Into Prepare.ReoccurringPayment
Select ProductKey,
CustomerKey,
OrderDate,
Ceiling(DateDiff(dd, OrderDate, PaymentDate)/30.0) PaymentPeriodKey
PaymentAmount,
0 CumulativePaymentAmount --placeholder
From Extract.ReoccurringPayment
The PaymentPeriodKey in this example is calculated using a simple algorithm. The algorithm uses DateDiff to calculate the number of days between the OrderDate and the PaymentDate, and divides that difference by 30 (30.0 is used to make SQL Server output a decimal). The Ceiling function is added to force SQL Server to round up. The resulting output is an integer that represents the period number where a period is 30 days.
Data delivery scripting
In general, during data delivery you follow two common script patterns — append and update, or load an empty table.
Append and update. For an incremental update, there is a specific scripting pattern to follow, as shown in Listing 10-4. Always run your updates before adding new data. If you add new data to your target table first, you end up updating that table again. The pattern involves matching the source and target tables on the business key of the data and checking to see if any attributes have changed for those records that matched. Data is updated for any changes from the source. The second step is to look for all records that do not exist in the target and load them from the source.
Listing 10-4: Append and Update Data
Update t
Set CustomerFirstName=s.CustomerFirstName,
CustomerLastName=s.CustomerLastName,
CustomerEmail=s.CustomerEmail,
ModifyDate=GetDate()
From Deliver.Customer t
Join Prepare.Customer s
On t.CustomerKey=s.CustomerKey
Where CustomerFirstName<>s.CustomerFirstName
Or CustomerLastName<>s.CustomerLastName
Or CustomerEmail<>s.CustomerEmail
Insert Into Deliver.Customer
Select s.CustomerKey,
s.CustomerFirstName,
s.CustomerLastName,
s.CustomerEmail,
LoadDate=GetDate(),
ModifyDate=GetDate()
From Prepare.Customer s
Left Join Deliver.Customer t
On t.CustomerKey=s.CustomerKey
Where t.CustomerKey Is Null
You can use a Merge statement to accomplish an append and update to your data, as well.
Load an empty table. You might perform a full data load into a target table, as shown in Listing 10-5. That could occur upon first load or a reload. In those cases, you need to follow a specific scripting pattern to make sure you're performing an efficient load. It involves truncating the table, dropping any indexes, inserting new data, and reloading the indexes.
Listing 10-5: Fulling Loading Data
Truncate Table Deliver.Customer
--Check if index exists
If Exists (Select *
From sys.indexes
Where Object_id = Object_Id(N'Deliver.Customer')
And name = N'Indexname')
--Drop index
Drop Index IndexName on Deliver.Customer
Insert Into Deliver.Customer
Select s.CustomerKey,
s.CustomerFirstName,
s.CustomerLastName,
s.CustomerEmail,
LoadDate=GetDate(),
ModifyDate=GetDate()
From Prepare.Customer s
--Reload indexes
--Check if index exists
If Exists (Select *
From sys.indexes
Where Object_id = Object_Id(N'Deliver.Customer')
And name = N'Indexname')
--Create index
Create clustered Index IndexName on Deliver.Customer
(
CustomerKey
)
Error handling
Error handling is an important part of your system. Without this function you may end up with bad data loaded into your target tables. There are two primary reasons for error handling:
· Trapping unexpected errors when they occur and raising an alert to users
· Handling errors raised purposefully in case specified conditions occur
The primary mechanism for handling errors in SQL Server is using the Try and Catch syntax, as shown in Listing 10-6.
Without a Try and Catch statement, SQL Server keeps executing any additional steps in your script until all of them are completed.
Listing 10-6: Try and Catch for Error Handling
BEGIN TRY
--Raise error if you are trying to trap a condition or skip this line if you are not
RAISERROR('Customer Error',16,1)
END TRY
/****** Error Handling ******/
BEGIN CATCH
DECLARE @ErrorMessage NVARCHAR(4000);
DECLARE @ErrorSeverity INT;
DECLARE @ErrorState INT;
SELECT @ErrorMessage = ERROR_MESSAGE(),
@ErrorSeverity = ERROR_SEVERITY(),
@ErrorState = ERROR_STATE()
RAISERROR
(
@ErrorMessage,
@ErrorSeverity,
@ErrorState
)
END CATCH
Beyond Try and Catch, always design your scripts to be restartable to avoid corrupting your system in case it was accidentally run more than once or to allow repopulation of incremental data in case of source system data corruption.
Creating and altering stored procedures
Stored procedures are SQL scripts that are stored in SQL Server. The benefit of using them for loading your analytics system is mostly for organization and ease of maintenance. Once built, you can call stored procedures from other SQL Server components, such as SQL Server Agent Jobs (for scheduling purposes) or by other procedures or applications. To create a stored procedure, follow these steps:
1. Open SQL Server Management Studio and expand the Programmability folder in your database.
2. Right-click the Stored Procedures subfolder and choose New Stored Procedure.
A Query window opens with a default stored procedure structure, as shown in Figure 10-9.
3. Insert your SQL scripts in between the Begin and End blocks and change the information in between <> according to your needs.
4. When done, click Execute to create the stored procedure.
Figure 10-9: Stored Procedures query.
You can see it in the Object Explorer window by refreshing the Stored Procedures folder.
You can alter a procedure by right-clicking the procedure and selecting Modify. Make the changes you need and then click Execute.
Indexing and Performance Considerations
Analytics systems performance has two aspects: load performance and query performance. Your objective when building the system should be to optimize both. We delve into the factors involved in achieving that goal in this section.
Understanding index types
Indexing is a very important factor in improving database performance. Indexing helps the database get to the data faster when you're retrieving data in your query. The types of indexes in SQL Server that are important for an analytics system are
· Clustered: This type of index sorts the data in a table according to the key specified in the index. It is advisable to use a clustered index on your tables and to keep that index as small as possible. Single column integers work best for these types of indexes. You can only have one clustered index in a table.
· Nonclustered: This type of index builds a pointer to the clustered index of the table if one exists. You can have multiple nonclustered indexes in a table, and you should build these for columns that are used in query joins and search criteria.
· Unique: A unique index is a constraint that you can place on a column or a group of columns to force SQL Server to allow only unique values or combinations of values in that index.
You should assess the queries in your system and add necessary indexes on your table to optimize performance.
Creating an index
To create an index, follow these steps:
1. Expand the table you're adding the index to in SQL Server Management Studio. Right-click the Indexes folder and choose New Index.
A New Index window opens, as shown in Figure 10-10.
Figure 10-10: Adding a new index.
2. Add a name in the Index Name box, choose the type from the Index Type drop-down list, and specify if the index is unique by selecting the Unique check box.
3. Click the Add button and select the column or columns that should be part of the index.
4. When finished, click OK.
SQL Server creates your new index on the table. Creating indexes may take some time on large tables.
Dropping an index
You can view your indexes by expanding the Indexes folder in SQL Server Management Studio. If you want to delete an index, right-click it and choose Delete. The Delete Object window opens, as shown in Figure 10-11.
Figure 10-11: Deleting an index.
Make sure the window lists the index you want to delete, and then click OK.
Additional tips and tricks
Several other factors are involved in performance optimization:
· Temp DB usage: As the name indicates, Temp DB is a temporary database that the SQL Server database engine uses for performing database operations. It's important to understand the role this database plays in queries and SQL scripts, because it impacts your system performance. For example, if you have a large query with a Union operator, SQL Server may create a temp table for that operation. If the Union is used in an Insert operation, you can easily achieve the same result by breaking yourInsert statement into two.
· Break up complex queries: Avoid building very complex queries in one SQL statement. Generally, the SQL Server optimizer is quite advanced and can handle a large number of joins in a query; however, it helps to break things down into multiple steps if you notice slow performance. Find the statements that are performing badly and experiment by breaking them into several scripts.
· Avoid multiple passes: Avoid updating the same table over and over in the same script. For example, if a table has several columns that need to be updated, a common approach beginners take is to write five update statements for each column. Following this approach means you're deleting and inserting the data in the same table five times (because each update is effectively a delete and an insert). When you have situations like these, take a step back and consider a more streamlined design.
SQL Solutions to Common Analytics Problems
As you start diving into using SQL for building analytics, you'll find a lot of requests and questions have a similar pattern. Business users often want to spin the data in similar ways to slice and dice the output and tease out answers that can result in better business decisions. We outline some of the more common questions in this section and describe how you can answer them using SQL.
Creating an Active Members Report
The value of a company often depends on the number of active members it has. Take the example of a social media site. Analysts always talk about monthly active members on the site and the growth, reduction, or flattening out of that figure.
Producing an Active Members Report may seem like an easy undertaking on the surface, but that is not the case. Things get even more complex if you want that report to show a running count of active members over time. To produce this type of report, it's necessary to effectively take a count of active members every day and store that number somewhere. Luckily there is a solution to this problem using SQL.
For the purposes of this example, consider the table shown in Figure 10-12. It represents a table of all the members in the system with the MemberKey along with the StartDate and CancelDate. A Cancelled flag also indicates if the member is active.
Figure 10-12: Members table.
If you think about the data in the Members table, you can deduce that you don't have an entry for each member for each day. Therefore, your query needs to simulate that action. To accomplish that, you have to add a CalendarDate table that has an entry for every day that you want to produce the report for (see Figure 10-13). This table can be created manually in Excel and copied into SQL Server.
Figure 10-13: CalendarDate table.
When you have the tables set up, you can write the query to produce the Daily Active Member Report (shown in Listing 10-7).
Listing 10-7: A Daily Active Member Report
Select c.CalendarDate, Count(*) DailyActiveMembersCount
From Member m
Join CalendarDate c
On c.CalendarDate between m.StartDate
and Case
When m.Cancelled=1
Then m.CancelDate
Else '12/31/9999'
End
Group By c.CalendarDate
Order By c.CalendarDate
The query in Listing 10-7 is a Between Join query that creates additional rows for every row in the CalendarDate table.
Creating a Cumulative Amount Report
Another common analytics request is creating a Cumulative Amount Report. If you take the case of a billing system, for example, the revenue generated from each customer is usually stored at the billing transaction level. However, in many cases, analytics requirements may force you to turn that amount into a running cumulative amount over the lifetime of the customer. Take the example of the reoccurring payment table shown in Figure 10-14.
Figure 10-14: Reoccurring Payment table.
Going from PaymentAmount to CumulativePaymentAmount over the PaymentPeriodKey requires making several passes at the table to aggregate PaymentAmount over every PaymentPeriodKey in sequential order.
The query for solving this problem is a Self Join back into the ReoccurringPayment on the keys of the table; see Listing 10-8.
Listing 10-8: A Cumulative Amount Report
Select r1.ProductKey,
r1.CustomerKey,
r2.PaymentPeriodKey,
Sum(r1.PaymentAmount) CumulativePaymentAmount
From ReoccurringPayment r1
Join ReoccurringPayment r2
On r1.ProductKey = r2.ProductKey
And r1.CustomerKey = r2.CustomerKey
Where r1.PaymentPeriodKey<=r2.PaymentPeriodKey
Group By r1.ProductKey, r1.CustomerKey, r2.PaymentPeriodKey
Order By r1.ProductKey, r1.CustomerKey, r2.PaymentPeriodKey
The query in Listing 10-8 is a less than or equal to query that simulates running through every combination of ProductKey and CustomerKey, and carries the amount from one PaymentPeriodKey to the next to create the cumulative report.
Creating a Top Performers Report
You may need to create a Top Performers Report related to customers or salespeople. For example, who are your top 10 customers in terms of revenue? Answering this question is relatively easy with SQL. A Group By query works.
If you take the ReoccurringPayment table, you can produce the Top Performers Report using the query shown in Listing 10-9.
Listing 10-9: A Top Performers Report
Select Top 10 CustomerKey,
Sum(PaymentAmount) TotalPaymentAmount
From ReoccurringPayment
Group By CustomerKey
Order By Sum(PaymentAmount) Desc
Creating an Exception List Report
An example of an Exception List Report could be: Who are your customers with a total revenue amount greater than $1,000? You use the ReoccurringPayment table. The SQL query to produce this report is shown in Listing 10-10.
Listing 10-10: A Exception List Report
Select CustomerKey,
Sum(PaymentAmount) TotalPaymentAmount
From ReoccurringPayment
Group By CustomerKey
Having Sum(PaymentAmount)>1000
Order By CustomerKey