Building Basic Dimensions and Cubes - Expert Cube Development with SSAS Multidimensional Models (2014)

Expert Cube Development with SSAS Multidimensional Models (2014)

Chapter 2. Building Basic Dimensions and Cubes

Having prepared our relational source data, we're now ready to start designing a cube and some dimensions. This chapter covers the steps you need to go through in order to create simple dimensions and cubes, and although you may be confident that you know how to do this already, we encourage you to read through this chapter nonetheless. You may be familiar with the overall process, but some of the detailed recommendations that we make may be new to you, and they could save you a lot of time and effort later on in your project.

In this chapter, we'll be taking a look at the following topics:

· Creating Data Sources and Data Source Views

· Creating dimensions, setting up user hierarchies, and configuring attribute relationships

· Creating a simple cube

· Deployment and processing

From a methodology point of view, this chapter represents the creation of the first draft of your cube. In subsequent chapters, we'll look at how you tackle the more advanced modeling problems. However, there are a lot of advantages in taking an iterative and incremental approach to cube development. If you're unfamiliar with your data or Analysis Services, it will allow you to get something up and running quickly so that you can be sure that it works properly.

It will also allow you to show your end users something quickly too, so that they can ensure it meets their expectations and that they can check over it for problems. This doesn't mean you shouldn't concentrate on getting what you build at this stage right first time though, far from it, but it is easier to get the details correct when you're concentrating on a small subset of the solution you hope to build eventually. We therefore recommend you to pick the most important fact table in your data warehouse plus a few of the more straightforward dimensions that join to it, one of which should be the Time dimension, and open up SQL Server Data Tools (SSDT).

Note

In the previous versions of Analysis Services, SQL Server Data Tools was known as BI Development Studio (BIDS). SSDT and BIDS are just different names for the BI project templates inside Visual Studio.

Multidimensional and Tabular models

As mentioned in the introduction, this book only concerns itself with Analysis Services Multidimensional models. With Analysis Services 2012, however, there is a second type of Analysis Services: Tabular models. Analysis Services Tabular models do much the same thing as Analysis Services Multidimensional models—an end user would probably not be able to tell if they were querying one type or the other—but the development experience and the underlying technology of the two types of models are very different.

The development experience for Analysis Services Tabular models is very similar to that of Power Pivot: the basic concepts are relational, you load data into tables and you create relationships between tables; there's a big contrast between this and the world of dimensions, attributes, measure groups, and cubes that we'll encounter in this book. The approach the Tabular model takes makes it very easy to create simple models, but arguably makes it harder to deal with complex requirements: for example, the ability to handle many-to-many relationships is built into Analysis Services Multidimensional, but needs to be coded into measure definitions in Analysis Services Tabular.

Analysis Services Tabular also uses a different way of storing its data to Analysis Services Multidimensional. Tabular models store their data in an in-memory, column store database and this can mean that certain operations (such as distinct counts) are much faster. That said, for most data volumes, both versions of Analysis Services perform adequately and any performance differences are likely to be negligible.

Even when installing Analysis Services, you need to be aware that there are two types of models: an instance of Analysis Services can run either in Multidimensional mode or Tabular mode, and you can't change from one to the other after installation. What's more, you should have a thorough understanding of the strengths and weaknesses of both types of models before you start your project and choose carefully between them. There is no easy way to convert a solution developed for Analysis Services Multidimensional to Tabular, or vice versa. A blog post that provides a good overview of the two different models and when you should use which is available at http://tinyurl.com/ TabularMulti.

Choosing an edition of Analysis Services

Before we start developing with Analysis Services, we need a clear idea of which edition of Analysis Services we're going to be developing for. There are three choices: Standard Edition, which is the cheapest but is missing some features; and BI Edition and Enterprise Edition, which are more expensive, but feature-complete. The only difference between BI Edition and Enterprise Edition, as far as Analysis Services Multidimensional goes, is that BI Edition is licensed on a server and Client Access License (CAL) model, whereas Enterprise Edition is licensed on a per-core model. The licensing cost is likely to be the major factor in the decision about which edition to choose. If money is no object, then you should use Enterprise Edition or BI Edition.

If money is an issue, then you'll just have to live with the limitations of Standard Edition. Of course, if we install Analysis Services on a server that already has SQL Server installed, then there are no extra license costs involved, but as we'll see in Chapter 11,Monitoring Cube Performance and Usage, we have to be careful they don't compete for resources. The SQL Server 2012 Licensing Guide gives a detailed breakdown of which features are available in each edition, and can be downloaded fromhttp://tinyurl.com/sql2012licensing.

Don't worry about having to use the Standard Edition though. Some of the features it lacks can be recreated with a little bit of extra work. The key features in Enterprise Edition and BI Edition are in the area of performance for very large or complex cubes, and you can go a long way with Standard Edition before you really need to use these features. The Deployment Server Edition project property, which is described next, will help you make sure you only use the features available in the edition of your choice.

Setting up a new Analysis Services project

The first step towards creating a new cube is to create a new Analysis Services project in SSDT. Immediately after doing this, we strongly recommend putting your new project into source control. It's easy to forget to do this, or not bother, because building a cube doesn't seem like a traditional development project, but you'll be glad that you did it when you receive your first request to rollback a change to a complex MDX calculation.

As you're probably aware, there are two ways of working with Analysis Services projects in SSDT:

· Project mode: This is where you work with a local Visual Studio project and deploy to your Analysis Services server only when you're happy with all the changes you've made

· Online mode: This is where you edit your Analysis Services database live on the server and commit changes every time you click on the Save button

You'll only be able to use source control software effectively if you work in the project mode. Therefore, it's a good idea to resist the temptation to work in online mode unless you're only making temporary changes to your cube, even though online mode often seems to be the most convenient way of working.

With all new Analysis Services projects, there are a few useful project properties that can be set. You can set project properties by right-clicking on the Project node in the Solution Explorer pane and selecting Properties, as shown in the following screenshot:

Setting up a new Analysis Services project

Here is a list of properties you may want to change at the project level, and the values you can set for them:

· Build: The following properties can be changed on the Build tab:

· Deployment Server Edition: If you plan to deploy Standard Edition in production, but you're using Developer Edition in development, you will want to set this property to Standard. This will make SSDT raise errors when you build a project if you accidentally use features that aren't available in the Standard Edition.

· Deployment: The following properties can be changed under the Deployment tab:

· Processing Option: This property allows you to process your database automatically whenever you deploy a project. The Default option will perform a Process Default, but in many cases, when you deploy a change that doesn't need any cube processing at all, a Process Default can still waste 10 or 20 seconds, and if you've made more substantial changes to an object you will still want to control when processing takes place. Setting this property to Do Not Process instead will stop all automatic processing. This means that you have to remember to manually process any objects yourself if you make changes to them, but it will save you time in the long run by preventing a lot of unintentional processing.

· Server: This contains the name of the server you're deploying to and defaults to localhost. If you're not developing on a local Analysis Services instance, then you'll need to change this anyway. Even if you are, it's a good idea to enter the name of the target server, rather than use localhost, in case anyone wants to work on the project on another machine.

· Database: This contains the name of the database that you're deploying to. It defaults to the name of the Visual Studio project. Of course, you can change it if you want your project and database to have different names.

We'll be looking again at how Visual Studio project configurations can help you when you need to deploy your project to your production environment in Chapter 10, Going in Production.

Note

It is a good idea to install BIDS Helper, an award-winning free community-developed tool that adds a lot of useful functionality to BIDS. You can download it from http://www.codeplex.com/bidshelper. We'll be referring to some of its features later on in this chapter.

Creating data sources

Once we've created a new project and configured it appropriately, the next step is to create a data source object. Even though you can create multiple data sources in a project, you probably shouldn't. If you've read the previous chapter, then you'll know that we recommend that all of the data needed for your cube should already be present in a single data mart.

You are then faced with the choice of which OLE DB provider to use, since there are often several different options for any given relational database. For SQL Server data sources, you have the option of using the SQLClient .NET data provider, the Microsoft OLE DB provider for SQL Server and the SQL Server Native Client (often referred to as SNAC). You should always choose the SQL Server Native Client since it offers the best performance. For Oracle data sources, the choice is more complicated since, even though Oracle is a supported data source for Analysis Services, there is a long list of bugs and issues. Some are addressed in the white paper available at http://tinyurl.com/asdatasources, but if you do run into problems, the best approach is to try using Microsoft's Oracle OLE DB Provider, Oracle's own OLE DB provider, the .NET Provider for Oracle, or any of the third-party OLE DB Providers on the market to see which one works. Access, DB2, Teradata, and Sybase are the other officially supported relational data sources and if you need to load data from another source, you can always use SQL Server Integration Services (SSIS) to push data into the cube by using the Dimension Processing and Partition Processing destinations in a Data Flow.

Note

Remember to install the same version of any OLE DB provider you're using on all of your development, test, and production machines. Also, while SSDT is a 32-bit application and needs a 32-bit version of the driver to connect to a relational database, if your Analysis Services instance is 64-bit, it will need the 64-bit version of the same driver to process cubes successfully.

Analysis Services must also be given permission to access the data source, and how it does so depends on the type of data source you're using and how its security is set up. If you're using Windows authentication to connect to SQL Server, as Microsoft recommends you to, then you should set up a new Windows domain account specifically for Analysis Services, and then use the SQL Server Configuration Manager tool to set the Analysis Services service to run under that account. You should then give that account any permissions it needs in SQL Server on the tables and views you'll be using. Most of the time, Read permissions will be sufficient. However, some tasks, such as creating Writeback fact tables, will need more. You'll notice on the Impersonation Information tab in the Data Source Designer dialog in SSDT that there are some other options for use with Windows authentication, such as the ability to enter the username and password of a specific user. However, we recommend that you use the Use Service Account option so that Analysis Services tries to connect to the relational database under the account you've created.

If you need to connect to your data source using a username and a password (for example, when you're using SQL Server authentication or Oracle), then Analysis Services will keep all sensitive information, such as passwords, in an encrypted format on the server after deployment. If you try to script the data source object out, you'll find that the password is not returned, and since opening an Analysis Services project in online mode essentially involves scripting out the entire database, you'll find yourself continually re-entering the password in your data source whenever you want to reprocess anything when working this way. This is another good reason to use project mode rather than online mode for development and to use Windows authentication where possible.

Creating Data Source Views

In an ideal world, if you've followed all of our recommendations so far, then you need to do very little work in your project's Data Source View (DSV)—nothing more than selecting the views representing the dimension and fact tables and setting up any joins between the tables that weren't detected automatically. Of course, in the real world, you have to compromise your design sometimes and that's where a lot of the functionality available in Data Source Views comes in useful.

When you first create a new DSV, the easiest thing to do is to go through all of the steps of the wizard, but not to select any tables yet. You can then set some useful properties on the DSV, which will make the process of adding new tables and relationships much easier. In order to find them, right-click on some blank space in the diagram pane and click on Properties. They are:

· RetrieveRelationships: By default, this is set to True, which means that SSDT will add relationships between tables based on various criteria. It will always look for foreign key relationships between tables and add those. Depending on the value of theNameMatchingCriteria property, it may also use other criteria as well.

· SchemaRestriction: This property allows you to enter a comma-delimited list of schema names to restrict the list of tables that appear in the Add/Remove Tables dialog. This is very useful if your data warehouse contains a large number of tables and you use schemas to separate them into logical groups.

· NameMatchingCriteria: If the RetrieveRelationships property is set to True, then SSDT can also try to guess relationships between tables by looking at column names. By default, the NameMatchingCriteria property is set to None, which means this won't happen, but there are three other possible settings for this property that translate to three different ways it can perform this matching:

· By looking for identical column names in the source and destination tables (for example, FactTable.CustomerID to Customer.CustomerID)

· By matching column names to table names (for example, FactTable.Customer to Customer.CustomerID)

· By matching column names to a combination of column and table names (for example, FactTable.CustomerID to Customer.ID)

This is extremely useful if the tables you're using don't actually contain foreign key relationships, or if you've used views on top of your dimension and fact tables in the way we suggested in the previous chapter. You'll also see an extra step in the New Data Source View wizard allowing you to set these options if no foreign keys are found in the Data Source you're using.

Now you can go ahead and right-click on the DSV design area and select the Add/Remove Tables option and select any tables or views you need to use. It might be a good idea not to select everything you need initially, but to select just one fact table and a few dimension tables so you can check the relationships and arrange the tables clearly, and then add more. It's all too easy to end up with a DSV that looks like a plate of spaghetti and is completely unreadable. Even though you don't actually need to add every single relationship at this stage in order to build a cube, we recommend that you do so, as the effort will pay off later when BIDS uses these relationships to automatically populate properties such as dimension-to-measure group relationships.

Note

Creating multiple diagrams within the DSV, maybe one for every fact table, will also help you organize your tables more effectively. The Arrange Tables right-click menu option is also invaluable.

Named queries and named calculations allow you to add the equivalent of views and derived columns to your DSV, and this functionality was added to help cube developers who needed to manipulate data in the relational database, but didn't have the appropriate permissions to do so. However, if you have the choice between, say, altering a table and a SSIS package to fix a modeling problem or creating a named query, then we recommend that you always choose the former one—only do work in the DSV if you have no other choice. As we've already said several times, it makes much more sense to keep all of your ETL work in your ETL tool, and your relational modeling work in the relational database where it can be shared, managed, and tuned more effectively. Resist the temptation to be lazy and don't just hack something in the DSV! One of the reasons why we advocate the use of views on top of dimensions and fact tables is that they are as easy to alter as named queries and much easier to tune.

Tip

If you make changes in your relational data source, those changes won't be reflected in your DSV until you click on the Refresh Data Source View button or choose Refresh on the right-click menu.

The SQL that Analysis Services generates during processing is influenced heavily by what goes on in the DSV and many processing performance problems are the result of cube designers taking the easy option early on in development.

Note

Problems with TinyInt

Unfortunately, there's a bug in Analysis Services that causes a problem in the DSV when you use key columns of the tinyint type. Since Analysis Services doesn't support this type natively, the DSV attempts to convert it to something else—a System.Byte for foreign keys to dimensions on fact tables and System.Int32 for primary keys on dimension tables which have Identity set to True. This in turn means you can no longer create joins between your fact table and dimension table. To work around this, you need to create a named query on top of your dimension table containing an expression that explicitly casts your tinyint column to a tinyint (for example, using an expression such as cast(mytinyintcol as tinyint)), which will make the DSV show the column as System.Byte. It sounds crazy, but for some reason it works.

Designing simple dimensions

Next, let's build some dimensions. As this is one of the more complicated steps in the cube design process, it's a topic we'll return to again in the future chapters when we need to deal with more advanced modeling scenarios. Right now, we'll concentrate on the fundamentals of dimension design.

Using the New Dimension wizard

Running the New Dimension wizard will give you the first draft of your dimension, something you'll then be able to tweak and tidy up in the Dimension Editor afterwards. The first question you'll be asked, in the Select Creation Method step is how you want to create the new dimension and there are effectively the following two choices:

· Create the dimension from an existing table or view in your data source (the Use an existing table option)

· Have SSDT create a dimension automatically for you and optionally fill it with data (the other three options)

The Select Creation Method step of the New Dimension wizard is shown in the following screenshot:

Using the New Dimension wizard

In keeping with our line of argument that all relational modeling work should be done outside SSDT, we recommend you to use the Use an existing table option. If you're trying to build a cube before you have prepared any dimension tables or indeed done any dimensional modeling, you're probably running ahead of yourself and you need to go back and think about your data modeling in more detail.

In the next step of the wizard, you have to choose the main table for your dimension—the table or view that contains the lowest level of granularity of data for your dimension, and which joins directly to a fact table in your DSV. You also need to define the key attribute for the dimension, the attribute that represents the lowest level of granularity in the dimension and to which all other attributes have a one-to-many relationship. For example, on a Product dimension, the key attribute could represent the Stock Keeping Unit(SKU), and on a Customer dimension, the key attribute could represent the customer itself. To define the key attribute, as with any attribute, you have to specify two extremely important properties:

· Key: This is the column or collection of columns in your dimension table that uniquely identifies each individual member on the attribute, and for the key attribute, this is usually the dimension's surrogate key column. This will set the value of the KeyColumnsproperty of the attribute.

· Name: This is the column in your dimension table that contains the name or description that the end user expects to see on screen when they browse the dimension. This will set the NameColumn property of the attribute, as shown in the following screenshot. The name may or may not uniquely identify each member on the attribute. For example, on your Customer dimension there may be many members on the key attribute—many individual customers—with the name John Smith. Member uniqueness is determined by the value of the key property of the attribute, but the end user may well be happy seeing multiple members with the same name, so long as they can distinguish between them using other dimension attributes such as Address.

Using the New Dimension wizard

Click on Next and if you are building your dimension from a set of snowflaked tables, you get to the Select Related Tables set, where you have the option of selecting other tables you want to use to build this dimension. After this, you move onto the Select Dimension Attributes step, where you have the option of creating other new attributes on the dimension by checking any columns that you'd like to turn into attributes. You can also rename the attribute, uncheck the Enable Browsing option (which controls the value of the AttributeHierarchyEnabled property, as shown in the following screenshot), and set the Attribute Type—a property that has a bewildering array of possible values, most of which are pointless, and which can be ignored unless you're designing a Time or an Account dimension.

Using the New Dimension wizard

Now that we are into the final step of the wizard, all there is to do is to confirm the name of the dimension and click on Finish.

Note

A note about naming

At various points in the New Dimension wizard, you have the chance to rename the dimension and attributes from whatever the wizard has guessed at as the default. It's a good idea to take naming seriously even at this early stage and discuss with your end users what the names of dimensions and attributes should be. Changing object names later in the development cycle can break any calculations or queries that you've already defined. Also, when you create an object for the first time, its Object ID property is set to its name, and even though you can subsequently change the name, the ID (which you'll see if you script the object out to XMLA, for example) can never change, which can be confusing if you ever need to work directly with XMLA code. Object names should also be as end user friendly as possible, as they're going to be the names that appear in the reports the end users want to build. The corollary of this is that object names are unlikely to be in a format that anyone with database design experience would choose.You might be happy with a dimension called DimProduct or an attribute called Usr_Addr_FrstLn, but when the CEO gets his sales report, he's not going to want to see these names. Think of designing a cube as designing a user interface for data access by non-technical business people.

Using the Dimension Editor

Once you've completed the wizard, you'll arrive in the Dimension Editor for your new dimension.

Adding new attributes

It's likely that you'll want to add some more attributes to your dimension, and to do so, you can simply drag columns from the tables displayed in the Data Source View pane on the right-hand side of the screen, into the Attributes pane on the left-hand side. Once again, there are a couple of important properties you'll want to set on each of your attributes once you've created them:

· KeyColumns, NameColumn: They are the column or columns that represent the key and the name of this attribute. It's common for non-key attributes to be based on just one column that represents both the key and the name. For example, a Year attribute on a Timedimension might have values such as 2001 and 2002. In this situation, it's sufficient to set the KeyColumns property and not set the NameColumn property.

Analysis Services will display the key of the member as its name. For attributes with a very large number of members, you should always try to use a column of the smallest possible numeric data type as the key to an attribute and set the name separately. Using strings as keys can have a performance overhead, as can using composite keys (that is, when you use more than one column in the KeyColumns collection).

· AttributeHierarchyEnabled: This property controls whether a hierarchy is built for this attribute, that is, whether the user can actually see this attribute when they browse the cube and use it in their queries. By default, it is True. If you set it to False, the attributestill exists, but is only visible in the dimension as a property of another attribute. A good example of when you would do this would be if you were building a Customer dimension. You'd never want to be able to see a hierarchy containing phone numbers and e-mail addresses. However, if you were looking at a specific customer, it might be good to see this information as a property of that customer. Setting this property to False also has the benefit of reducing the amount of time needed to process the dimension.

· AttributeHierarchyOptimizedState: For attributes that have their AttributeHierarchyEnabled property set to True, this property controls whether indexes are built for the resulting attribute hierarchy. Setting this property to False can improve dimension processing times, since building an index for an attribute can take some time. However, the penalty you pay in terms of query performance can be quite significant, so you should only consider setting this to False for attributes that are rarely used in queries and that are large enough to have an impact on dimension processing times.

· OrderBy, OrderByAttribute: The order in which members appear in an attribute hierarchy can be important. For example, you would expect days of the week to appear in the order of Sunday, Monday, Tuesday, and so on rather than in alphabetical order. TheOrderBy attribute allows you to order the members on an attribute either alphabetically by name, by the value of the key of the attribute, by the value of the name, or by the key of another attribute that has a relationship with the attribute being ordered (which attribute is used is controlled by the value of the OrderByAttribute property). It is only possible to sort in an ascending order. As a result, you may end up creating new numeric columns in your dimension table to use as the key of an attribute in order to ensure the type of sorting you want.

· AttributeHierarchyOrdered: In some situations, where the order of members on a very large attribute hierarchy doesn't matter much, disabling sorting by setting the AttributeHierarchyOrdered property to False can save a significant amount of time during dimension processing. This is shown in the following blog entry: http://tinyurl.com/attributeordered.

· IsAggregatable, DefaultMember: For attributes that have their AttributeHierarchyEnabled property set to True, the IsAggregatable property controls whether an attribute's hierarchy contains an All Member as the root of the hierarchy. In general, you should not set this property to False, even for something like a Year attribute on a Time dimension where you might think it makes no sense to display an aggregated value, because it can have all kinds of confusing consequences for the user. Without an All Member, another member on the hierarchy will be automatically selected every time a query is run, and the user may not know which member this is. You can control which member is selected automatically by setting the DefaultMember property, for example, making the default selection on the Year attribute the last year on the hierarchy. However, in our experience it's better to let the user make a selection themselves. The only time you should consider setting IsAggregatable to False is when DefaultMember is used and you have members in an attribute that should never be aggregated together. For instance, in a Budget Scenario attribute, you would get meaningless values if you show the value of actuals and several different budget scenarios aggregated together. Usually, this situation arises when you only have one attribute on the dimension with AttributeHierarchyEnabled set to True.

Note

The 4 GB string store limit

If your dimension contains attributes with several million members, you need to be aware of a limitation with the default format that Analysis Services uses to store string values. If a single attribute needs to store more than 4 GB of string data, processing will fail unless you have changed the StringStoreCompatibilityLevel property on the dimension from the default value of 1050 to the value 1100. More detail on this issue and the StringStoreCompatibilityLevel property can be found at http://tinyurl.com/4GBStringStore.

Even if you don't run into this error when you are in development, you should set this property if you think you have an attribute that may grow over time to exceed this limit. There is a detailed explanation of how to calculate the amount of string storage space needed for an attribute at http://tinyurl.com/StringStoreCalculator.

Configuring a Time dimension

Building a Time dimension is really just the same as building a regular dimension, although there are a few extra properties that need to be set. Going back to the Select Creation Method step of the Dimension wizard, you'll see that several of the available options are specifically for building Time dimensions, but we still recommend that you do not use them and construct a Time dimension yourself; they're only useful for proof-of-concepts. Getting SSDT to build and populate a Time dimension table in your data source can be useful as a quick way to get started on building your own dimension. However, it's very likely that you'll want to make changes to what the wizard creates, not just because it uses a DateTime column as a primary key rather than an integer. The option to generate aTime dimension on the server—so that there is no relational data source needed—suffers from the same problems, but is even more inflexible since there's no way to customize things such as member name formats beyond the options the wizard gives you. There are a lot of resources on the Web that can help you with building your own Time dimension table, such as the one shown at http://tinyurl.com/BuildTimeDim.

The important thing to do with a Time dimension is to tell Analysis Services that it is in fact a Time dimension, so you can access some of the special functionality that Analysis Services has in this area. You can do this by setting the dimension's Type property to Timeand then setting attributes' Type property to the nearest approximation of what they represent, as shown in the following screenshot. So, for a Year attribute you would set Type to Years, a Month attribute to Months, and so on.

Configuring a Time dimension

The benefits of doing this are:

· Certain MDX functions such as YTD() are time aware, and will not need you to specify certain parameters if you have a dimension marked as type Time

· Semi-additive measures, that is, those with their AggegationFunction set to AverageOfChildren, FirstChild, LastChild, FirstNonEmpty ,or LastNonEmpty, will perform their special types of aggregation on the first Time dimension related to a measure group

Creating user hierarchies

User hierarchies, the multilevel hierarchies that you create in the central Hierarchies pane in the Dimension Editor, can best be described as being something like views on top of attribute hierarchies. They allow you to take two or more attribute hierarchies and combine them into a more complex drillpath. For example, you could take your Year, Month, and Date attributes and combine them into a single hierarchy with Year, Month, and Date levels, so that the user could then easily drill from a year down to see all the months in that year and from a month down to all the dates in that month.

When Analysis Services 2005 was first released and developers started coming to grips with the concept of attribute hierarchies and user hierarchies, one question that came up quite frequently was, "Should I expose attribute hierarchies or user hierarchies to my users?". As usual, the answer is that it depends. There are pros and cons to each and you'll probably end up using a mixture of both types of hierarchy. User hierarchies are more user-friendly because drilling down is just a matter of a double-click. On the other hand, rigidly defined drillpaths might be too limiting, and users might prefer the flexibility of being able to arrange attribute hierarchies whichever way they want. Certainly there are some things you can't do with user hierarchies, such as putting different levels of the same hierarchy on different axes in a query. For you as a developer, user hierarchies have some benefits. They make writing the MDX for certain types of calculation easier (members in a multilevel user hierarchy have meaningful "parents" and "children"), and "natural" user hierarchies, which we'll talk about next, are materialized on disk and so offer query performance benefits and also make the aggregation design process much easier.

If you do create a user hierarchy, you should definitely set the Visible property of its constituent attribute hierarchies to False. If you have too many hierarchies available for a user to choose from in a dimension, you run the risk of confusing the user, and having the same members appear more than once in attribute hierarchies and user hierarchies will make matters worse. Simplicity is a virtue when it comes to dimension design.

Configuring attribute relationships

Attribute relationships allow you to model one-to-many relationships between attributes. For example, you might have Year, Month, and Date attributes on your Time dimension, and you know that there are one-to-many relationships between Year and Month, and Monthand Date, so you should build the attribute relationships between these three attributes to reflect this. Why? The short answer is performance. Setting attribute relationships optimally can make a very big difference to query performance, and the importance of this cannot be overstated—they drastically improve Analysis Services' ability to build efficient indexes and make use of aggregations. However, attribute relationships do not have anything to do with how the dimension gets displayed to the end user, so don't get them confused with user hierarchies.

It's very likely that there will be many different ways to model attribute relationships between the same attributes in a dimension, so the important thing is to find the best way of modeling the relationships from a performance point of view. When you first build a dimension in SSDT, you will get a set of attributes relationships that while correct, are not necessarily optimal. This default set of relationships can be described as looking a bit like a bush. Every non-key attribute has a relationship defined either with the key attribute, or with the attribute built on the primary key of the source table in a snowflaked dimension. You can visualize these relationships on the Attribute Relationships tab of the Dimension Editor, as shown in the following screenshot:

Configuring attribute relationships

If you have BIDS Helper installed, right-clicking on the dimension in the Solution Explorer window and selecting Visualize Attribute Lattice will display something like what's shown in the following screenshot. The built-in functionality, in our opinion, sacrifices clarity in favor of saving space, and the BIDS Helper visualization is easier to understand. BIDS Helper displays each attribute as a separate node in its relationship diagram, whereas on the Attribute Relationships tab, multiple attributes are sometimes displayed in the same node.

Configuring attribute relationships

You can see that in our example using Year, Quarter, Month, and Date attributes on a Time dimension, Analysis Services knows that a year is made up of many dates, a quarter is made up of many dates, and a month is made up of many dates, but not that years are made up of quarters or that quarters are made up of months. Changing the attribute relationships to reflect these facts gives you something resembling a long chain, and in general the more long chains you see in your attribute relationships the better. As Analysis Services understands transitory relationships, there's no need to define a relationship between Year and Date, for example, because it can see that there is an indirect relationship via Quarter and Month. In fact, defining such redundant relationships can be a bad thing from a performance point of view.

Here's what the optimized set of relationships looks like in the Dimension Editor:

Configuring attribute relationships

Here's what the same set of relationships looks like in BIDS Helper:

Configuring attribute relationships

You will also notice a change in how some user hierarchies are displayed once you've optimized your attribute relationships in this way. User hierarchies that have one-to-many relationships defined between the attributes that make up each level are called naturaluser hierarchies. User hierarchies that don't are called unnatural user hierarchies and display an amber warning triangle in their top left-hand corner. This warning triangle does not signify that the hierarchy is broken or incorrectly configured, just that you should check your attribute relationships to see whether they can be optimized. You may actually want to build an unnatural user hierarchy, and it may not be possible to set up the attribute relationships to make it into a natural user hierarchy, and so long as you are aware that unnatural user hierarchies may not perform as well as natural user hierarchies, you should not be put off by the warning.

Configuring attribute relationships

It is very important to understand your data before you set up any attribute relationships, because if you set them up incorrectly, then it could result in Analysis Services returning incorrect data. The Dimension Health Check functionality in BIDS Helper, available when you right-click on a dimension in the Solution Explorer, checks whether the attribute relationships you've defined actually reflect the data in your dimension table. Here's an example of a common mistake that is made: your Year attribute has members 2001,2002, 2003, and 2004; your Quarter attribute has four members from Quarter 1 to Quarter 4. Is there a one-to-many relationship between Year and Quarter? The answer is in fact no, despite what common sense tells you, because, for example, a quarter called Quarter 1appears in every Year. What you can and should do here is modify your data so that you can build an attribute relationship: a quarter called Quarter 1 2001 only exists in the year 2001. There are two ways to do this:

· By modifying the data in your dimension table (the best option)

· By using a composite key in the KeyColumns property of the Quarter attribute that is made up of both the keys for the quarter and the year

You can read a more detailed discussion of these techniques at the end of the following article: http://tinyurl.com/attributerelationships. You should not make changes like this at the expense of the users' requirements, but in most cases they will not mind minor modifications such as these and will appreciate the increased query performance.

The RelationshipType property of an attribute relationship indicates whether the relationship between any two members on two related attributes is ever likely to change or not. For example, you would hope that the date January 1st 2001 would always appear under the month January 2001 in your dimension. If it moved to the month March 2002, something would be very wrong with your source data. In this case, you should set the RelationshipType property of the Month-Date attribute relationship to Rigid. On the other hand, it may be true that the relationship between a Customer attribute and a City attribute may change over time if a customer changes his/her residence. In that case, the RelationshipType property should be set to Flexible. Setting RelationshipType to Rigid where possible increases the chances that aggregations will not need to be rebuilt when a dimension undergoes a ProcessUpdate, something that will be discussed in more detail in Chapter 10, Going in Production. Our advice is never to set the RelationshipType property to Rigidunless you are absolutely sure that no UPDATE or DELETE statement is ever run on the underlying dimension table. If a rigid relationship does change, dimension processing will fail.

Finally, attribute relationships also have a second function, and that is to act as member properties. If a Month has many dates, you can think of the relationship in reverse as a Date having a property which is its month. A better example might be a Customer having anAddress, and if the Address attribute has its AttributeHierarchyEnabled property set to False then it may only be visible as a property of Customer. Every attribute relationship is visible by default as a member property in most client tools. If you want to hide the member property, you can set the relationship's Visible property to False.

Building a simple cube

With some dimensions built, the next step is to run the cube wizard to create the cube itself. Remember that at this stage, all we want to do is build a very simple cube so that we can test-drive the data, so we're not going to do anything other than run the wizard. You'll be doing a lot of work in the Cube Editor in the next stage of development, but if you've set up the DSV in the way we recommend, then you'll find that when you've finished running the wizard, you will have something that you can deploy, process and browse immediately with no changes required.

Using the New Cube wizard

On the Select Creation Method step of the wizard, as with the same step of the New Dimension wizard, choose the Use an existing table option—the Create an Empty Cube and the Generate Tables in the Data Source options can be ignored for now. The former is useful in more advanced scenarios, but regarding the latter, we'll repeat what we said earlier: you should model your data properly in the data warehouse before you start building anything in Analysis Services. On the Select Measure Group Tables step, just select the fact table you chose earlier as the basis for the simple cube you want to build. Then in the Select Measures step, select one or two columns from that fact table that represents commonly used measures, which can be aggregated by summation. On the Select Existing Dimensions, select all the dimensions that you've just built that join to the fact table you've chosen. Don't bother creating any new dimensions on the Select New Dimensions step. Finally, on the Completing the Wizard step, enter the name of the cube and click on Finish. As was the case when creating dimensions, it's worth putting some thought into the name of your cube. Try to make the name reflect the data the cube contains, make sure the name is meaningful to the end users and keep it short, as you'll probably have to type this name hundreds of times over the lifetime of the project.

Project deployment

With the wizard complete, go to the Build menu in the main Visual Studio menu bar and select Deploy <MyProjectName>. Deployment is actually a two-stage process:

· First the project is built. You can think of this as being similar to compiling some .NET code, except instead of an executable or a dll, the end result is four files containing the XMLA representation of the objects in your project, and information on how the project should be deployed. You can find these files in the bin directory of your Visual Studio project directory.

· Then the project is deployed. This takes the XMLA created in the previous step, wraps it in an XMLA Alter command and then executes that command against your Analysis Services server. Executing this command either creates a new Analysis Services database if one did not exist before, or updates the existing database with any changes you've made.

It's quite common for deployment to fail as a result of you making a mistake somewhere in your cube or dimension designs. If this happens, you should see all of the errors you need to correct in the Error List window. When you go to the appropriate editor to correct the error, you should also see a red squiggly line underneath whichever object it is that needs fixing as with the Month attribute in the following screenshot:

Project deployment

Note

Warnings and blue squiggly lines

Even if you don't see any red squiggly lines anywhere and your project deploys successfully, it's likely that you'll see warnings in the Error List window and blue squiggly lines in various places in SSDT (see for example, the Date dimension node in the preceding screenshot). These design warnings will occur where you have built something that works, but does not reflect best practice. A few of the warnings concern quite trivial things, but in general, it's a good idea to pay attention to them. If you want to though, you can dismiss warnings by right-clicking on them in the Error List window and clicking on Dismiss, while leaving a comment explaining why you've done this for future reference. You can also manage which warnings appear and see which ones have been dismissed by right-clicking on the project in the Solution Explorer, by clicking on Edit Database to open the Database Editor and going to the Warnings tab

Database processing

Assuming you've set the Processing Option project property to Do Not Process, then the result of deployment will be a new Analysis Services database on the server containing a cube and several dimensions. You won't be able to browse the cube yet though, as all that has been created are empty structures which need to be loaded with data from your data warehouse. In order to load the data, you have to process the objects and you can do this easily from within SSDT by going to the Database menu and clicking onProcess. Once you've done this, the Process Database dialog appears. Leave the Process Options column showing Process Full, click on Run, and everything in the database should be processed.

As an Analysis Services database is nothing more than a collection of objects, processing a database involves nothing more than processing all of the cubes and dimensions in the database. In turn, cubes are made up of measure groups, which are made up of partitions, and dimensions are made up of attributes, and all of these objects have their own discrete processing operations. As a result, processing a database can kick off a lot of individual processing jobs and by default, Analysis Services will try to do a lot of this processing in parallel to reduce the overall time taken. A Process Full will always drop any data currently in an object and reload its data from scratch. It's the most time-consuming form of processing and there are a lot of other options here that we can use to reduce processing time, which we'll discuss in Chapter 10, Going in Production.

You can watch all of these processing jobs executing in the Process Progress window that appears after you click on Run in the Process Database dialog. You'll see each object that is being processed listed, as well as the time processing started and ended. If you expand each object, you'll eventually find the SQL queries run to retrieve data from the data warehouse. If you select any node and click on the View Details button, you'll see a new window appear, containing all of the text for the node, and this is very important when it comes to viewing long SQL queries or messages, as shown in the following screenshot.

As you can't do anything else in BIDS while processing is taking place, it can be a good idea to just do a Deploy from there and then start processing separately in SQL Management Studio. This way, you can carry on developing while processing is taking place.

Database processing

Processing errors are unfortunately as common as deployment errors, and they are caused by four basic types of problems:

· Changes in the underlying relational schema mean that Analysis Services contains invalid references. For example, you might build a measure from a column in a fact table that is subsequently renamed. You'd only find this out if you refreshed your DSV or at processing time when Analysis Services generated SQL that used the old name.

· Key errors: You can think of Analysis Services as performing an inner join between tables in a snowflaked dimension, or between dimension tables and fact tables, although it very rarely does so in the SQL it generates. For example, if it finds a dimension key value in a fact table that doesn't exist in the dimension table, by default, it will raise an error and the processing will fail. As mentioned in Chapter 1, Designing the Data Warehouse for Analysis Services, you should try to ensure this never happens in your ETL, even if some accidents are inevitable. You can configure processing so that key errors are ignored, or unknown keys are assigned to a special unknown member on a dimension by clicking on the Change Settings button in the Process dialog to show theChange Settings dialog and going to the Dimension Key Errors tab. However, we do not recommend you do this except as insurance against these accidents. You'll get a lot more flexibility with a custom solution. That said, at this stage of the cube development process, it can be useful to ignore errors just to get the cube to process so you can show it to your users.

· Processing objects in the wrong order: For example, if you process a dimension, then update the underlying relational data so that there are new rows in the dimension table and new rows related to them in the fact table, and then process your cube without reprocessing the dimension again first, you'll again run into key errors.

· MDX Script errors: It's very often the case that when you make structural changes to a cube or a dimension, these changes break MDX calculations on the cube. For example, you might rename a dimension, but still have MDX code where the old dimension name is referenced. When a cube is processed, the last thing that happens is that all of the code on the MDX Script is executed and if it now contains syntax errors, the whole processing operation will fail. This is extremely frustrating when it happens, and it can lead to a lot of wasted time. If you are making a lot of structural changes, it can be a good idea to comment out the whole of the MDX Script before you do any processing, and only uncomment it when you have successfully processed your cube.

Summary

With processing complete, you can take a look at your cube for the first time, either in the Browser tab of the Cube Editor or in your client tool of choice. Now is a good time to reflect on what we've seen of the cube development process so far. We've created a very basic cube from a single fact table and a few dimensions rather than attempting to build something more complex. This has allowed us to get a feel for our data and have something to show our users quickly so that they can check if we're on the right track. We then built a single Data Source and Data Source View. Since we spent time getting our data modeling right earlier, there was very little to do here other than connect to our data warehouse and select the tables or views we want to work with. Next, we built a few of the less complex dimensions we need, configuring attribute relationships and creating user hierarchies as necessary. Finally, we ran the New Cube wizard to build our basic cube, then deployed and processed it so that it can be queried.

In the next chapter, we'll go on to look at how we can deal with more complex problems when designing dimensions.