Training Business Data Stewards - Data Stewardship (2014)

Data Stewardship (2014)

CHAPTER 5 Training Business Data Stewards

The outline of a training guide for new Data Stewards is presented, along with explanations of items that are not otherwise included in this book.

Keywords

training; data quality; Data Stewards; skills; roles; responsibilities

Introduction

Data Stewardship does not just happen. It takes people and support for the function. Data Stewards are much more effective if they are trained to carry out their responsibilities and tasks. Just as importantly, sustaining Data Stewardship over time requires overall support for the role. When new Data Stewards get brought on board, they need training, and existing Data Stewards need to receive ongoing reinforcement. In addition, new business functions can get their Data Stewards quickly up to speed if the organization prepares itself to take a comprehensive approach to (and has robust curricula for) Data Stewardship.

It is a best practice to create a repeatable course that can be given to both the original Business Data Stewards and to any replacements who come along. This course is an introduction and overview to the responsibilities of the Data Stewards.

In addition, it is usually advantageous to create several smaller courses that can be given just before the Business Data Stewards take on a new responsibility. These shorter courses focus on particular skills and responsibilities. For example, it doesn’t make much sense to train the Business Data Stewards to perform their data profiling responsibilities if you aren’t actually going to profile data until six months or a year later.

Don’t skimp on the effort put into training the Business Data Stewards. They are key players in Data Governance, and if they are not well trained, the results will generally be of lower quality, take longer, and be less consistent.

Note

Keep in mind that different Data Stewards have different skills based on their experience and background. As a result, some Data Stewards will need training on topics that others do not. For example, if you have Data Stewards who are practiced at performing the analysis for data profiling, you can work with them to develop courses to educate the less experienced stewards. Taking the approach of leveraging the knowledge of some Business Data Stewards to train the rest leads to an increase in the overall level of Data Stewardship performance.

One very important thing to keep in mind when training the Business Data Stewards is to not waste the training opportunity. The common causes of training failure are:

- Teaching the wrong skills. Only teach the skills they need, and don’t waste your time training them for things they won’t use. For example, if the Business Data Stewards are not going to run the data profiling tool, then it is pointless to teach them how to do so. On the other hand, they are likely to analyze the results coming out of the tool and make recommendations for corrections and process improvements, so those skills should be taught (see Figure 5.1). Of course, ensuring that only the right skills are taught means that you must understand what those skills actually are, and this book should make that clear.

image

FIGURE 5.1 Teach only the skills indicated as belonging to the Business Data Stewards. They don’t need to be trained on the skills in Information Technology (IT) or the Project Management Office (PMO).

- Teaching at the wrong skill level. Business Data Stewards are heavily involved with improving information quality. However, the theoretical concepts of data quality (e.g., the theory behind improved data quality as discussed in the Introduction to the bookJourney to Data Quality by Yang W. Lee et al., MIT Press Books, 2006) are not terribly important. Instead, the Business Data Stewards must understand the life cycle of their data, the information chains of data within their own organization, and the techniques on how to conduct root-cause analysis and error-proof new systems. Thus, the focus on the training should be on the practical “doing” of information quality improvement and other Business Data Stewardship activities.

- Teaching at the wrong time. “Just-in-time” training (training very close to the time someone will apply these concepts) should be the goal of any training program. This is especially true if the skills being taught aren’t at all familiar to the Business Data Stewards. For example, on a previous project when we taught the new project managers what was needed to include Data Stewardship as part of their project plan, those project managers who didn’t build a project plan until several months after the training had already forgotten about us and did not include Data Stewardship deliverables, resources, and funding in the plan. We dealt with this by conducting an overall high-level training as part of the project manager on-boarding, with a refresher for new project managers (until they got used to including us) during the planning phases of any new project. That is, we used a combination of an introduction to get them familiar with the idea and “just-in-time” reinforcement.

- Teaching the wrong audience. This one seems pretty obvious, but again, if you don’t have a clear idea of what duties various participants will be performing, you will waste your time and theirs teaching them skills they will never use. For example, it makes little sense to teach the information producers how to do data profiling, while it makes lots of sense to teach the information producers how to safeguard quality when entering data.

- Addressing the wrong objective. You need to gauge what your audience is ready for and teach at that level. For example, at first you might need to provide lots of background, essentially “selling” them on the ideas, and convincing them of the value (why). But later, you’ll need to focus on what and how—the ends that need to be achieved and the methods for achieving them. Teaching an audience the why when they are ready for how will just bore them and convince them that the whole thing is a waste of time.

To sum up, for training to be successful, you need to train the right audience in the right skills at the right skill level at the right time and for the right reasons (to achieve the right objectives). In the case of Business Data Stewards, they must immediately learn what Data Governance and Data Stewardship are all about, why they are crucial to the effort, and what value they add, their key early responsibilities, and the processes and procedures that they will either apply immediately or participate in creating in the early stages of the effort. The training should focus heavily on the practicalities of getting the work done and documented. It should also focus on the responsibilities that every steward will be called upon to perform right away; namely, defining key terms and managing them on an ongoing basis, managing data issues, and working together as a cohesive group.

Curricula for Training Business Data Stewards

There are a lot of topics that can be covered when training Business Data Stewards. Though they are all discussed here, you will want to tailor your training to avoid the pitfalls discussed in the last section, such as training too early. As discussed previously, you will probably want to break up the lessons into multiple parts and deliver them as needed.

Basic Principles

The basic principles around Data Stewardship are a good place to start the training. Much of the information in the following lesson headings is presented elsewhere in this book or can be extracted from presentations held at major conferences. Of course, you will need to customize the information for your own use and your own company. For example, if poor data quality is hampering your company’s ability to do business and maintain competitive advantage, you will want to focus on that when you discuss why Data Governance and Data Stewardship are important. On the other hand, if data quality is not perceived as a problem (and believe it or not, there are many companies where that is true) but there is confusion about definitions, then the focus should go there instead.

The lesson headings for basic principles are:

- What is Data Governance?

- Why is Data Governance important?

- What is Data Stewardship?

- The importance of Data Stewardship and what happens without it.

- Where Data Stewardship fits in with the overall Data Governance initiative.

- The detailed structure (operating model) of the overall Data Governance organization.

- The detailed structure of the Data Stewardship Council.

- The types of Data Stewards: Business, Technical, Project, and Domain (and Operational if you choose to use them).

- Why these particular stewards were selected, and who selected them.

- Major roles and responsibilities of Business Data Stewards:

image Metadata: definitions, derivations, data quality rules, creation and usage rules.

image Stewardship and ownership: what they mean and levels of decision making.

image Data quality: what it means and establishing data quality levels in context.

- Major roles and responsibilities of Technical Data Stewards:

image The overall role of IT in Data Stewardship and tools.

image Technical explanations of how and why programs work the way they do.

image Information about physical database structures and ETL.

image Interpretation of production code.

- Meetings, web support, and logistics.

- General principles of information management.

Metadata for Key Business Data Elements

One of the most critical things Business Data Stewards need to do is establish the key business data elements and create an enterprise-wide accepted definition for each data element. Just as important (for calculated data elements) is a standardized derivation rule so that the data element is always calculated the same way. As you can imagine, having a standardized derivation rule eliminates a great deal of confusion and effort to try and reconcile different reports that use the data element. Finally, the data quality, creation, and usage rules must be defined. See Chapter 6 for more information on how to create robust metadata.

Creating, gathering, and documenting metadata is a crucial part of Business Data Steward training, as creating rigorous metadata is not something that most data analysts are familiar with. Further, it is important to have a standardized set of guidelines for performing the work and evaluating the quality of the result.

Uses of Data

It is important to teach Business Data Stewards how to determine how data is used throughout the enterprise. Decisions about the data have impacts to source systems, ETL (Extract, Transform, and Load), data stores, and reporting. These impacts need to be understood. For example, a decision to standardize on a single set of valid values (e.g., for marital status) could require modifications to source systems, or at least a set of conversion rules that will need to be implemented in ETL prior to being loaded into a data warehouse.

Data is everywhere in the enterprise and affects all aspects of an organization. Data also crosses organizational boundaries and is easily reproduced and repurposed, making it more difficult to manage than other organizational assets. Given that situation, it becomes a challenge to narrow down what data to review, manage, improve, and monitor. There are a variety of models for understanding how data is used across the organization and its information chains. Understanding the relationship between information producers and consumers is one way, and SIPOC (Supplier–Input–Process–Output–Customer) is another. A combination of both is probably necessary in most organizations if you hope to make real progress. Put another way, in one way or another, the starting point for managing and governing data needs to be documented knowledge and understanding of how data exists and moves through an organization. Once the overall information chain is understood the information producers and consumers and SIPOC can be applied to the links in the information chain.

Information Producers and Consumers

Information producers are those who input or import the data; information consumers are those who use the data for analysis and to run the business. There is often a disconnect between these two groups, which causes the information producers to produce or collect information that is either insufficient or of too low a quality (or both) for the needs of the information consumers. This disconnect is often the result of not treating data as an enterprise-wide asset. The key lesson to be taught here is that the specifications given to the information producers need to take into account the requirements of the information consumers, even when those consumers are not part of the business unit that is collecting the data. For example, in one large insurance company, the information producers who collected the data needed to create a new homeowner’s insurance policy did not collect the owner’s birth date because the policy wasn’t priced using the age of the homeowner. Without the birth date, it wasn’t possible to reliably identify the customer as part of the master customer effort, which had a large impact on a variety of other enterprise efforts. However, when asked to collect the birth date, the information producers refused, because doing so meant taking extra time, and they were paid based on policy throughput. Data Governance had to get involved to get both their attitude and their compensation requirements adjusted.

Note

The disconnect between information producers and information consumers is sometimes referred to as the “Silk Road Problem.” Long ago, the Chinese produced silk, but didn’t know who bought it, or what their requirements were. The Europeans bought the silk, but didn’t know where it came from or how to ask for changes in color or weave. Only the Persians, who actually moved the silk from China to Europe, were aware of both ends of the transaction. The problem with the Silk Road was that the Chinese did not know how their silk was being used, and thus could not produce it the way the customer wanted. The Europeans had the opposite problem—they knew what they wanted, but did not know how (or who) to ask for it. By connecting the needs of the consumer to the producer, the Persians ensured that the Chinese could sell more silk, the Europeans could buy more silk, and the Persians could make more money transporting more product. This story also shows that when the incentives are correct, the producers will provide what the consumers need!

Using SIPOC to Understand Data Use

Another way to look at the flow of information is via SIPOC (Supplier - Input - Process - Output - Customer), as shown in Figure 5.2. This set of terms comes from the process world, primarily Six Sigma. SIPOC has the capability to become a tool for understanding the uses of data in the enterprise much in the same way as a data flow diagram. By understanding where the data comes from (S), what it is used for (C), and what is done to the data on its trip from supplier to customer (P), you can:

image

FIGURE 5.2 SIPOC illustrates the flow of information through an enterprise in steps.

- Understand the requirements that the customer has for the data.

- Understand the rules governing how the data is provided.

- Determine the gap between what is required and what is provided.

- Trace the root cause of data quality failures.

- Create requirements for modifying the processes that move the data.

The SIPOC principle can be applied at many different levels of detail. At a high level, for example, claim data is used by actuarial to assess risk. At a detailed level, a rule for calculating a data element results in an unexpected number because of a condition that was not anticipated.

In this model, each step in the information chain is broken down into components that supply information as input to a process, and the output from the process is supplied to the customer. The customer may be the supplier to the next link in the process, and so on. By analyzing the flow of the data in this way, the needs of all the customers in the chain are understood, and the supply of data as well as the output of processes can be analyzed to ensure that it meets all the enterprise’s needs. Examples of high-level and low-level SIPOC are provided as described in Figure 5.2.

Introduction to Data Stewardship Processes

Data Stewardship is all about having well-defined and repeatable processes. Working with well-defined and repeatable processes—which may be foreign to many of the stewards—must be introduced and constantly reinforced during the training. The processes include using an issue log (see Chapter 6) and using workflows with approval steps and time limits for those steps. The stewards must be made to understand that these processes and their workflows will lead to efficient management of the data and a higher-quality product than ad-hoc methods.

The initial training should include some examples of basic processes, such as:

- Defining and updating key business data elements

- Opening and working issues with business processes

- Defining and remediating data quality issues (see Figure 5.3)

image

FIGURE 5.3 A procedure for handling data quality issues. The horizontal “swim lanes” show the various actors (e.g., the Enterprise Data Steward) who take part in the procedure, and the rectangles specify the individual steps, connected by flow arrows.

- Defining and executing Data Stewardship procedures (see Figure 5.4)

image

FIGURE 5.4 A procedure for defining and using Data Stewardship procedures.

- Working with other stewards to reach a solution on data problems

- Providing input for Master Data Management (see Chapter 7)

- Supplying information security classifications (see Chapter 7)

Tools of the Trade

The training must also focus on how to use the Data Stewardship toolset, including:

- The Data Stewardship website. As will be discussed in Chapter 6, this is a key artifact that ties other tools together and serves as a reference for everything that involves Data Governance and Data Stewardship.

- The Data Governance Wiki. Data Governance and Data Stewardship have many terms that may be unfamiliar to the general population. As these definitions are worked out, they need to be published in the Wiki. Business Data Stewards should know how to look up items, provide links to their coworkers, and either directly update the Wiki or make a request to have an update made (depending on how you set up the update procedure).

- Business glossary. The business glossary is a key deliverable of Data Stewardship. This tool holds the business metadata documentation, such as the list of key business data elements, definitions, derivations, owning Business Data Steward, and all logical/business rules. In addition, some tools allow for semantic classification of the terms, logical lists of valid values, and more. Business Data Stewards should know how to look up items, provide links to their coworkers, and either directly update the glossary or make a request to have an update made. Ideally, Business Data Stewards should be able to update their own terms (with an audit trail), but not all tools have the access security necessary to limit their access to just the appropriate data elements.

- Metadata repository. Whereas the business glossary handles the logical/business metadata, the metadata repository handles the physical metadata, such as database and file structures, lineage and impact analysis (based on ETL), and making the connection between the physical data and the business data elements stored in the business glossary. Unlike other tools, Business Data Stewards are not expected to make updates to the metadata repository, but should learn how to use it to look at databases and understand where data elements come from (lineage).

The Role of IT in Managing Tools

After reading through the list of technical tools needed to properly support Data Governance and Data Stewardship, you may be wondering where the support for these tools will come from. IT has an important support role for commercial tools, such as a metadata repository and probably the business glossary. Servers need to be purchased and installed; the software licensed, installed, and maintained; and issues with underlying databases/repositories dealt with in a timely manner. In addition, some of the more complex metadata repositories require considerable expertise to customize, as well as needing batch jobs set up and run periodically to update the metadata.

All of these tasks are typically handled by IT, which will need to assign and train one or more resources to support the Data Stewardship effort. Note that these tasks are not the same as what is expected from Technical Data Stewards; instead, some of the tasks are for developers, others for system maintenance, and others for database administrators. As you consider adding tools to support Data Stewardship, make sure to engage IT early to enable them to estimate, budget, and properly staff to provide production support. Keep in mind that Data Stewardship tools must also be considered a necessary part of the enterprise applications they make useful. For example, a data warehouse is not usable if there is no data dictionary that allows the end users of the data warehouse to know what data elements they are looking at, what they mean, how they are derived, and so on.

Note

Whereas it is possible to build “homegrown” tools for the Data Stewardship website, the Wiki, and the business glossary, it is rare to do so with a metadata repository. That is due to the complexity of the metamodel and the complex functionality underlying the tool. Instead, it is almost always necessary to license a commercially available tool. In many cases, it is possible to license a combined business glossary/metadata repository. This arrangement has the potential advantage of having two tools that are already connected and able to exchange metadata. Unfortunately, not every vendor connects up their toolset this way, so “buyer beware.”

Training for Data Quality Improvement

Improving data quality provides much of the driving force—and visible results—for Data Governance and Data Stewardship. At some point, therefore, the Business Data Stewards need to be trained in how they can play an important role in a data quality improvement effort. Key points to train the Data Stewards in include:

- Framework for data quality:

image How the organization defines quality data

image What data quality rules are, and how to define them

image Detection and documentation of data quality issues

image How to do root-cause analysis

image How business process improvement increases the quality of data

image Data cleansing, and when it is appropriate

image Ongoing logging/measurement of data quality levels

- Principles of data profiling and Business Data Steward’s roles in analyzing the results:

image Viewing, investigating, and rendering decisions on the profiling results (what is a problem and what isn’t).

image Analyzing the quality of the data for both newly governed data and existing governed data:

– For newly governed data, Data Stewards need to establish data quality criteria based on the data usage and do a level of analysis to determine whether the data meets those quality criteria.

– For existing governed data, Data Stewards need to plan and execute ongoing data measurement and analysis to ensure that the quality of the data does not diminish. If the data usage changes (and thus the data quality needs changes), the data should be handled as newly governed data, as in the previous bullet.

Note

Data profiling in this context means not only running a profiling tool and analyzing the results, but also performing ongoing data analysis to confirm that the quality of the data continues to meet the data quality goals. These goals can change over time as the uses of the data changes, causing data that was once considered to be of good quality (because it met the then-current needs) to now be considered of insufficient quality.

Summary

If Business Data Stewards are well trained and learn the skills they need just before using them, the overall Data Stewardship effort will be both more effective and more efficient. In addition, as major efforts like data quality improvement get underway, the stewards need to be trained on how to participate in those efforts as well.

Good training for Data Stewards is the same as good training for any other subject. It must include clear goals, appropriate level materials, exercises for reinforcement, tests for comprehension, and the ability to apply what you’ve learned. Appendix B shows outlines for two training plans: training Technical Data Stewards and training project managers. Note how much of the material is repeated between the plans, including instructing the trainees on what Data Governance is and why it is important to them.