Data Stewardship (2014)
CHAPTER 10 Summing It All Up
Provides a summary of the main learnings from the earlier chapters in this book.
Data Stewardship; Data Governance; metrics; maturity; master data management; roles; identity resolution; responsibilities; accountability
We have come almost to the end of this book on Data Stewardship. With the information you have learned here—and using this book as a reference going forward—you are now well-equipped to set up and run a Data Stewardship program, or participate actively in one. Keep in mind, however, that every company is different, every culture is different, and thus every Data Stewardship implementation will be different. The information presented here, while based on wide experience, is not meant as a straight-jacket. If you find that something isn’t working for you, you should not try to force-fit the recommendations of this book to your company. Instead, understand the concepts and then adapt them to your own program. For example, some companies are very “flat”—they don’t have many levels of management. In such a company, it may be advisable to have only a two-level Data Governance organization, rather than the three levels discussed in this book. Just realize that however many levels you have, someone must be performing the responsibilities listed in Chapter 3.
We have covered a lot of ground, and this chapter is meant to provide a quick guide to the concepts covered in this book. This chapter will also summarize important points, to serve as a reminder of what was covered in detail.
In the Beginning: Data Governance and Data Stewardship
In Chapter 1 we began the journey by discussing what Data Governance and Data Stewardship are, the deliverables, the roles and responsibilities, and the organizational structure needed.
Data Governance provides the means and organization to manage a corporation’s data assets. A Data Governance program establishes roles and responsibilities for the participants in the program. These participants include executives, Data Governors, Data Governance Program Office (DGPO) personnel, and many types of Data Stewards.
Data Stewards work together to get most of the day-to-day work of Data Governance accomplished. Data Stewardship is the operational aspect of Data Governance. Business Data Stewards are designated to take on these roles by the Data Governors, while Technical Data Stewards are designated by IT management. Business Data Stewards make recommendations and steward the data (including managing the metadata) owned by the business function they represent. Technical Data Stewards provide expertise around systems, ETL (Extract, Transform, and Load) jobs, data stores, data warehouses and marts, and business intelligence tools.
Overall Goals of Data Stewardship
Chapter 1 also discusses the overall goals of Data Stewardship—the milestones a good program can achieve. These milestones include having full participation from the business (Business Data Stewards) and IT (Technical Data Stewards), as well as having the organization in place to support these efforts, such as a Data Governance Board, a DGPO, and a Data Stewardship Council. In addition, policies and procedures are in place, and Data Stewardship processes are integrated into the enterprise, including project management and system development. The participants in the program are rated on how well they perform their stewardship duties, and the necessary supporting tools (e.g., a business glossary, issue log, and metadata repository) are in place, in use, and supported.
Metadata and Data Stewardship
At the most basic level, data needs to be moved to a governed state, where not only does a business function own the data, but people representing that organization (most notably the Data Governor and Data Stewards) are accountable for the data. A lot of the work of moving data to a governed state requires collecting and documenting metadata, such as the business name, definition, and business rules (including data quality rules, creation rules, and usage rules).
Policies and Procedures
Generating policies, processes, and procedures is a key early deliverable from a Data Governance effort. And although executives approve policies, it is up to the DGPO—with solid support and contributions from the Data Stewards and Data Governors—to actually draft policies that everyone can live with. The policies drive processes to ensure that the goals of the policies are met, and procedures state how to execute on the processes. Data Stewards are the ones who follow processes and procedures, and thus must have a great deal of influence about what they contain.
An organizational structure is crucial to successful Data Governance and Data Stewardship. This organization often has three levels (though not always), with the Executive Steering Committee at the top, then the Data Governors in the middle, and the Data Stewards at the base. The organization is supported by various functions in IT, including analysts, developers, and Technical Data Stewards. The organization is also supported and run by the DGPO, which handles the running of Data Governance Board meetings and Data Stewardship Council meetings. The DGPO also takes care of logistics, such as instituting a web portal, ensuring that all work is properly documented, making progress reports to executives and the Data Governance Sponsor, and managing tools such as the issue log (among many other things).
Executive support (including the Data Governance Sponsor (who is often the COO or CFO) is crucial to Data Governance and Data Stewardship because a change in the culture of the organization may be necessary, as well as a change to how people are rewarded for data work. An additional headcount may be necessary, and people in a position of power must promote the vision for the program. Policies must be approved at the executive level, and business priorities need to be set for the corporation as a whole.
Data Governors make up the Data Governance Board. They are ultimately accountable for business data use, data quality, and prioritization of issues. They make most of the decisions that impact data based on recommendations from the Data Stewards. If a budget is provided for Data Governance, the board has the authority to spend that money on data management improvements. They ensure that performance measures align to Data Governance objectives, assign Business Data Stewards, and represent all data stakeholders in the Data Governance process. Data Governors own and govern the business processes that produce data and define data strategy based on the needs of the enterprise.
Data Governance Program Office and Data Governance Manager
The driving force behind the Data Governance program is the Data Governance Manager, sometimes also known as the Chief Data Steward. This individual runs the DGPO, and works directly with the executives, Data Governors, and Data Stewards.
The DGPO personnel coordinate the Data Governance program, making sure that everything accomplished in Data Governance is fully documented and available. The DGPO also defines best practices, creates and delivers the Data Governance training, enforces data-related policies and procedures, and manages the Data Governance collateral, such as the business glossary and issue log.
The Data Governance Manager ensures that the three committees (executives, Data Governors, and Data Stewards) are fully staffed and that all appropriate business areas are represented. This individual collaborates with leadership to implement capabilities and processes, ensures that Data Governance processes are integrated into the overall enterprise processes, chairs both the Data Governance Board and the Executive Steering Committee, works closely with the Data Steward community, and defines and implements the Data Governance program. In addition, he or she has the ultimate responsibility for getting the Data Governance message and mission out to the enterprise and for getting appropriate involvement from support organizations (e.g., IT).
Enterprise Data Steward
Another role in the DGPO is the Enterprise Data Steward, who runs the day-to-day stewardship effort. This role may be filled by the Data Governance Manager, but as the effort grows, it may become necessary to staff this role with another headcount. The Enterprise Data Steward manages any Domain Data Stewards or Project Data Stewards, as well as working with many stakeholders throughout the enterprise, such as Data Governors, IT developers, project managers, and the Data Governance Manager. This individual ensures that all work efforts are in line with the overall business objectives, establishes prioritization criteria, and makes sure that Data Governance is included on projects. The responsibilities also include formulating and maintaining processes, managing Data Stewardship Council meetings, reviewing and managing issues, formulating and implementing Data Governance metrics, and creating and updating progress scorecards.
Sponsorship of the Data Governance program is very important. There needs to be an overall business sponsor (who should be a “C”-level person) as well as an IT sponsor. Business sponsorship is important because Data Governance is, overall, a business initiative. Business Data Stewards are business people. One of the primary tenets of Data Governance is that the business, not IT, owns the data. Nevertheless, IT plays a crucial role in Data Governance, so an IT sponsor is needed. IT resources must maintain the Data Governance tools, provide expertise in how technical systems work, and serve as gatekeepers to ensure that changes proposed by the business do not violate Data Governance principles.
Types of Data Stewards and their Responsibilities
Chapter 2 explains the various different types of Data Stewards and Chapter 3 details their responsibilities, as well as describing how the Data Stewards work together as part of the Data Stewardship Council.
Business Data Stewards
A Business Data Steward represents a business function and stewards the data owned by that function. Business Data Stewards are members of the Data Stewardship Council. Their recommendations are acted on by the Data Governors. They are people who work closely with the data, and can reach out to stakeholders and subject-matter experts to gather any information about the data that they lack. They care about the data—its meaning, quality, and how it is used. Appointing someone as a Business Data Steward usually involves formalizing a role that has been in place unofficially, and can actually end up saving time because questions only have to be answered once and then documented. It is very important to document who the Business Data Steward is for governed data so that analysts know who to turn to when clarification about the data is needed or when proposing changes to meaning, business rules, or usage.
Of course, not everyone is suitable for the role of Business Data Steward. Characteristics of a good Business Data Steward include being able to write well, being committed to improving data management processes, and having good people skills.
Business Data Stewards identify key business terms and collect and recommend metadata, such as definitions, business rules, and data quality rules and requirements. They work closely with the Data Governors and stakeholders to understand and remediate any issues or concerns, define data quality improvement opportunities, assess impacts of proposed changes to data, and ensure that the business has a practical understanding of the data. They also help define processes, execute on those processes, provide business requirements for data, participate in conflict resolution, and ensure that data decisions are communicated to their business partners and stakeholders.
Technical Data Stewards
It is important to understand where and how data is stored, as well as how the data is transformed as it moves through the information chain. Further, there needs to be expertise on how systems or processes work, and how to interpret IT collateral such as database structures and Cobol Copylibs (yes, these still exist). Providing this information is the role of Technical Data Stewards. They belong to IT, and are assigned based on the systems and technical processes in which they have expertise. Technical Data Stewards are official participants in Data Stewardship and are responsible and accountable for providing answers to technical questions in a timely manner.
Project Data Stewards
Project Data Stewards represent the goals of Data Governance and Data Stewardship on projects and are trained to recognize issues and questions that should be brought to Business Data Stewards. They act as proxies for the Business Data Stewards (who cannot be present on every project). Project Data Stewards collect issues and questions (e.g., data meaning, usage, and data quality issues) and bring those requests for information to the Business Data Stewards for recommendations and decisions. Project Data Stewards then bring that information back to the project team. They must work together with other Data Stewards and stakeholders, as well as with the project manager and project team members.
Another key role is to consult with the Business Data Stewards and project manager to determine if (and when) data needs to be profiled, and then assist in the data profiling effort.
Project Data Stewards are typically paid for by the project they are assigned to, at least on large projects.
Domain Data Stewards
Domain Data Stewards work with data that must be shared and cannot be owned by a single business function. The shared data has multiple Business Data Stewards as stakeholders. Domain Data Stewards take the role of a Business Data Steward, but must coordinate recommendations and proposed changes with the Business Data Stewards who have an ownership interest in the shared data. Domain Data Stewards must document which of the business functions have a stake in the data, as well as collecting and documenting the information about the shared (domain) data. Most of their responsibilities are the same as Business Data Stewards, but Domain Data Stewards must take multiple lines of business into consideration when assessing impacts and making recommendations, as well as achieving consensus among the various Business Data Stewards who have a stake in the shared data.
Operational Data Stewards
Operational Data Stewards provide support to Business Data Stewards in areas where the Operational Data Stewards interact closely with the data. For example, an Operational Data Steward may be responsible for entering data, and thus might notice and communicate issues with data quality (and help research those issues) before the issues are noticed in downstream systems.
Operational Data Stewards make sure that data creation and update procedures are followed when creating new values or existing values are modified. They also assist the Business Data Stewards in the collection of data metrics, the performing of data analysis, and remediating data issues.
How Stewards Interact
The various kinds of Data Stewards must work together to achieve results. Business Data Stewards represent their business area and steward their data. They may get help from Operational Data Stewards, who often work directly with the data and are in a position to see data issues very early on. Domain Data Stewards take care of shared data (e.g., customer) and work with the Business Data Stewards who are stakeholders in the shared data. Project Data Stewards represent Data Stewardship on projects and bring issues to (and answers from) the affected Business Data Stewards. Technical Data Stewards provide technical input to the other types of stewards about systems, ETL, and other IT-centric resources.
Data Stewards must also interact with data stakeholders. Stakeholders include anyone who enters or uses the data, so Data Stewards need to know who these people are. The stakeholders must be consulted when changes to the data are being proposed to assess the impact that the change would have on everyone who uses the data. Failure to consider the need of the stakeholders can lead to broken business processes, failed system and application jobs, and errors in the data warehouse and operational data stores.
Data Stewardship Council
The Data Stewardship Council is the venue for much of the collaboration work for Data Stewards. It is a formal organization within the Data Governance effort with a well-documented structure, members (the Data Stewards), and a leader (the Enterprise Data Steward). Business Data Stewards represent business functions that own data, though exactly how those functions are broken up depends very much on how the business (and the company) is organized.
The Data Stewardship Council has some major goals to achieve as a group. They need to find ways to get value out of the data; advise the enterprise on data standards, guidelines, and policies; resolve data issues; enforce the use of proper business terminology; and communicate their recommendations and decisions to data users. In addition, they must ensure that the Data Governance effort aligns to the goals of the business, perform the day-to-day processes, establish and communicate the rules for using data, help evaluate the effectiveness of Data Governance, and provide input for creating goals and metrics. Further, the council must collaborate with all the stakeholders to ensure that metadata and data issues are considering the needs of all the data users.
Implementing Data Stewardship
Chapter 4 steps through the processes you’ll need to go through to start up a Data Stewardship effort. These steps include ascertaining how the company is organized, who the data subject-matter experts are, and which business functions potentially own data. It also includes setting up the structure of Data Stewardship, and finding out what information already exists in the organization that you can leverage as a starting point for Data Stewardship deliverables.
Getting the Word Out
It is crucial to get the word out that Data Stewardship exists, what it is, and why the community of data users should care and participate. Even with assigned Business Data Stewards, the cooperation and participation of the data analysts are necessary to support the effort, and it is important that the community of Data Analysts know who the stewards are and what they do. Data Analysts can make information they have collected available, and advise others about Data Stewardship. IT developers are an audience that is often missed when getting the word out, yet it is these very developers who struggle with data issues when attempting to create or modify applications, and they can serve as “gatekeepers” by notifying Data Stewardship personnel when requirements are being given to IT that ignore policies and procedures, or do not include required information.
To effectively get the Data Stewardship message out, it is necessary to have various versions of the message prepared and be able to deliver the appropriate version of the message to audiences around the corporation. The messages (in addition to varying in length) must stress the idea of accountability and decision making around data and metadata. Longer versions can be presented as part of standardized company training vehicles, such as “brown bag” lunches, and the verbal messages and presentations should be supplemented using the company’s publications, such as newsletters and webcasts. Don’t forget to customize the message to the audience by using examples and illustrations of your points that are important to the people you are addressing. You also need to tailor the presentation to the level of detail that is appropriate to your audience. For example, while executives will focus on high-level policies, data analysts and developers are more interested in the processes they need to be aware of.
Data Stewardship has wide-ranging impact across the company, and it is thus important to obtain support for the effort. Support from upper management makes it clear that Data Stewardship is important to the organization. In addition, getting staffing, funding, adjusting the corporate culture, and changing the reward system to encourage accountability around data requires high-level management support. Support from the community of data users is also very important. The Data Stewardship effort will benefit greatly from the contributions of people who use the data regularly, benefit from improved data quality, and understand the issues that arise when data is not treated as a valued asset.
Understand the Organization
The organization’s structure and culture will have a tremendous influence on how Data Stewardship is organized. The business(es) that a company is engaged in specify what business functions are required. Many of these business functions will own and steward data, so it is very important to understand the structure and business functions to specify where Data Stewards will need to come from. Further, the complexity of the organization, as well as the number of business functions, tends to increase if the company has a lot of lines of business.
It important to focus on business functions rather than business units. While business units change frequently (with every reorganization, it seems), business functions remain fairly stable because a business function is an area of the business. That is, unless the business a company is in changes materially, business functions remain the same. Of course, in a lot of cases high-level business functions do align with business units or departments. Examples include Sales, Marketing, and Finance. These particular business units also tend to be good places to start looking at data, since not only can they often quantify the cost of poor-quality data, but at least in the case of Finance, they are used to having stringent rules and governance around their data.
Complex organizations—such as those that have grown by acquisition or include multinational business units or even whole companies—may dictate more complex Data Stewardship organizations. In these cases, Data Stewards may need to be chosen who represent different companies or countries within the organization. These Data Stewards steward their data and, in the case of data that must be standardized across the entire corporation, work together to achieve consensus on the common data elements. An extra layer of Data Stewardship that stretches across the enterprise is often necessary to specify the common elements and work with the individual stewardship councils to obtain the needed results. This organizational structure is shown in Figure 4.5.
Another aspect of the company that is important to successful Data Stewardship is a clear understanding of how many levels of management are in place, as well as what decisions are made at what levels. This understanding is key to staffing Data Governance and Data Stewardship with people from the “right” levels. For example, it often does you no good to pick Data Governors from a level of management that isn’t authorized to make decisions about the data.
It is important to consider the culture of the organization as well. Data Stewardship usually requires a change in the behavior of the data users. The concept of accountability for data, data quality, and metadata is usually foreign to data users. The idea that someone has the right and responsibility to make decisions about the data can be very much at odds with the way the organization functions. Indeed, the company may operate totally on consensus-based decisions, and Data Stewardship is likely to change that. Realize that culture changes slowly, and if the corporation’s culture is at odds with Data Stewardship, then it will take some time to slowly change that, even with executive support.
Figuring Out Where to Start
At the beginning of a Data Stewardship effort, the primary focus will be on data, metadata, and improvement of data quality, processes, and setting up a toolset for documentation. In most cases, some work has been done, and “pockets” of information exist throughout the enterprise. The trick is to find these pockets and leverage them to jumpstart your effort.
Figuring Out What Data You’ve Got
Most companies have thousands of data elements, but it is a good idea to first understand what domains of data a company has and which must be stewarded. For example, virtually every company has financial data, and most have customer, sales, and marketing data. From there, the domains reflect what businesses the company is in, such as underwriting, actuarial, and claims data for an insurance company. Understanding the domains of data enables the stewardship effort to drill down to a more granular level to understand the pool of data elements. Very often, examining operational and business intelligence reports can give you a good starting point for understanding what data elements exist. In addition to figuring out what data a company has, it is important to know where that data originates (input system or system of record), how that data originates (hand-entered or input file), and where else it resides (data stores, business intelligence tool, data warehouse). Further, you need to know how the data moves from one place to another, how it is transformed, and what part of the organization is responsible for the movement and storage of the data.
Figuring Out What Metadata You’ve Got
A significant portion of what Business Data Stewards produce is metadata. Because metadata like definitions, data quality rules, and valid values are very useful to the people who use data, it is highly likely that many data and reporting analysts have been busily collecting this information for their own use and perhaps even sharing it with their coworkers. In some departments there may even be someone managing the metadata. And in IT, there may be a library of legacy system data dictionaries. These collections take all sorts of forms—spreadsheets, SharePoint lists, printed books, Access databases, and more. The Data Stewardship Council needs to locate these resources, gather them up, and, where appropriate, use the metadata as a starting point for defining the key business data elements.
Figuring Out What Data Quality Information You’ve Got
Improving data quality is one of the most convincing reasons to implement Data Stewardship. Poor-quality data is such a pressing problem in most enterprises that it is likely that the data users have identified data quality issues. They may even have tried to remediate those issues themselves—the most frequent reason that data users build their own databases and spreadsheets is to manipulate the data the way they need to, and that need is often driven by data that is of insufficient quality to meet their needs. A survey needs to be done to identify what data quality issues have been noted and documented. These issues may be in an IT issue tracking tool, a business issue log, project documentation, or even a quality assurance tracking tool. The company may have ongoing projects for which the primary purpose is to fix data quality issues. These projects are rarely (if ever) called “data quality” projects; instead, they are referred to by the business process that is failing due to poor-quality data. But it is eminently possible to figure out which projects are specifically dealing with data of insufficient quality by reading the project justification documents.
Another indicator that poor-quality data exists is whether the data is regularly inspected (profiled). If data users in the company are regularly running SQL queries to examine the data (or running a profiling tool), you should figure out who these people are, what they are looking for, and what criteria they are using to figure out whether the data is “right.” The answers to these questions not only give you some reasonable places to begin data quality improvement efforts, but also a starting point for defining data quality rules.
At a recent engagement, we discovered over 100 stored procedures (code that runs in the database) that examined data being loaded into the Data Warehouse. By examining the code, we were able to determine which data elements were of concern, as well as what data quality rule measurements were being applied. The business people running and using the Data Warehouse were also able to tell us why these particular data elements were of concern to the organization.
Figuring Out What Processes You’ve Got
Data Stewardship establishes repeatable processes and workflows for managing data. But you should not assume that you need to start from scratch. Many enterprises have processes that can be used or extended. The advantage to using existing processes is that people are used to them (or should be). It is especially common for IT development processes to include rudimentary data handling procedures, which can be extended to include Data Stewardship. Discussions with both business analysts and IT operations can ferret out existing processes. These processes can be analyzed as part of the Data Stewardship effort. Business Data Stewards and Technical Data Stewards can evaluate what processes make sense to keep, what processes should be upgraded to include Data Stewardship, and even what processes have outlived their usefulness or are counterproductive to proper data management.
Figuring Out What Tools You’ve Got
A variety of technical tools can streamline and automate Data Stewardship, as well as making stewardship deliverables available to a wide audience. These tools include a business glossary, metadata repository, data profiling tool, and a web portal. Realize that some of these tools may already exist in the enterprise, though that fact may not be well known. To discover what tools you may already have, check with the folks who manage software licenses. You may find that the company already has licenses for some or all of these tools as part of a suite licensing deal. Knowing what tools you have also can save you a lot of effort when the time comes to specify tools. If you already have a tool licensed, it is advisable to examine whether that tool meets all or most of your business and technical needs before going through the effort to specify requirements, perform tool comparisons (with vendor presentations), and license, install, and learn the new tool.
Training Data Stewards
People who are new to Data Stewardship must receive adequate training to understand and perform their duties and tasks. This includes the Data Stewards themselves, as well as those who must interact with the Data Stewards. Data Stewards must understand their role, what is expected of them, how to fulfill their duties, and how to work together with other Data Stewards and participants in the Data Stewardship effort. The initial group of Data Stewards must be trained, and any new Data Stewards who are added must receive the same training. In addition, as the scope of the Data Stewardship effort expands (e.g., to monitor ongoing data quality), new training must be provided. Keep in mind that some Data Stewards will require training on topics that others do not. This is because different Data Stewards have differing backgrounds, skills, and experience.
People who interact with Data Stewards must also receive training so that they understand why they are required to work with the Data Stewards, what value they can expect to receive from that collaboration, and how working with the Data Stewards impacts the participant’s job. Project managers and developers are among those who should receive training in conjunction with rolling out Data Stewardship.
Training is the responsibility of the DGPO, which also trains Data Governors and executives.
As with any other training, it is important to teach the right skills, at the right skill level, to the right audience, at the right time. Training time can be difficult to obtain, and coordinating the schedules of many different people to perform the training can be challenging as well. Don’t waste the opportunity!
Business Data Stewards Curricula
The biggest change from normal day-to-day tasks is likely to be for the Business Data Stewards. They must be taught their job, and that is covered in Chapter 5. Business Data Stewards must be taught what Data Governance and Data Stewardship are all about and what value these efforts add to the company. They must also be told why they are crucial to the effort, what value they add, and their main responsibilities. In addition, Business Data Stewards need to be taught what processes and procedures they will be using right away, with a focus on the practicalities of getting the work done and documented. There also needs to be a focus on working with metadata, especially defining terms and business rules, as well as logging and remediating data issues. Improved data quality is one of the main deliverables of the Data Stewardship effort, so the stewards need to understand how to define what data quality is, how to measure quality, and how to state quality in terms of intended usage.
There also needs to be a focus on the structure of Data Stewardship, including where it fits into the overall Data Governance organization, the many types of Data Stewards (including Business, Technical, Project, Domain, and Operational).
Data Stewards also need to understand how data flows through the information chain, and how each link in the chain can produce requirements and issues. Two different models of data in motion can help here: information producers and consumers, and SIPOC (Supplier–Input–Process–Output–Customer).
Data Stewardship Processes and Tools
Data Stewardship uses well-defined processes, so training needs to focus on key processes you will need to use in the early stages of the effort. These processes include defining and updating key business data elements and metadata around those elements, opening and working issues, defining and executing Data Stewardship procedures, working with other Data Stewards, and supporting related efforts such as projects and information security.
A good set of robust tools helps automate Data Stewardship and provides ways to easily communicate the results of the efforts. But Data Stewards must be trained how to use tools such as the Data Stewardship website, Data Governance Wiki, business glossary, and metadata repository (or however many of these tools you have available).
Training for Data Quality Improvement
Improving data quality provides much of the return on investment for Data Governance and Data Stewardship. However, many Business Data Stewards are not aware of the techniques and processes needed to drive this important activity. They need to be taught the framework for data quality and the principles and processes involved with data profiling. The framework includes how the organization defines quality data, how to define data quality rules, how to detect and document data quality issues, how to do root-cause analysis, the proper role of data cleansing, and ongoing logging of data quality levels. The principles of data profiling include the Business Data Steward’s role in analyzing profiling results, investigating and making decisions based on the results, and establishing required levels of data quality based on the actual state of the data.
Being Practical About Data Stewardship
Focusing on the practical and fundamental aspects of Data Stewardship (discussed in Chapter 6) leads to more job satisfaction for Business Data Stewards and a sense of adding value by performing their duties. These aspects include determining key business data elements, assigning stewardship for governed elements, creation of quality metadata, defining and following repeatable processes, and procedures for streamlining the logistics of working together. Much of the fundamental work revolves around a well-managed issue log.
Choosing Key Business Data Elements
Choosing the key business data elements from among the thousands of elements most companies have is important because it focuses activities on just those data elements that are worth the effort—that is, provide a return on investment for the Data Stewardship effort. These data elements typically include financial reporting data, compliance and regulatory data, data elements “suggested” by company executives, data used by high-profile projects, and data elements that Business Data Stewards themselves deem to be important and worth their efforts.
Assigning the Owning Business Function
Key business data elements must have at least one (and usually only one) business function that owns the data element. The Business Data Steward representing that business function is then responsible for the data elements. Although a data element may be used by many different business functions (perhaps for many different purposes), there are usually two key questions that you can ask to help determine which business function should own an element. The first question is: Whose business will fundamentally change if the definition or derivation of the data element were to change? This question separates those who use the data element from those who depend on it. The second question is: Where (within a business process) does the data element originate? This question enables you to determine what business function creates the data, as well as then trace it through the information chain, determining what business processes (and what business functions) are affected by the data element. Those who are affected are stakeholders, and must be consulted by the responsible Business Data Steward to understand the impacts of any proposed changes.
Building a Robust Definition
Defining the business terms is crucially important, as confusion and conflict around data usage often centers on what the data means. A useful definition should describe the term using business language, state the purpose of the term to the business, be specific enough to discern the difference between the term and similar terms, and link (rather than restate) to other terms used in the definition. You can evaluate whether the definition is sufficient by asking whether the definition leaves you asking additional questions or needing more details, and whether someone new to the organization (but schooled in the business of the organization) can understand the definition.
Defining Business Rules
Creation, usage, and derivation rules are key to ensuring a full understanding of the data element. The creation rules ensure that all conditions (including data collected) are in place before data is created. If data is created that does not meet the creation rules, there are bound to be issues when the data is used. The creation rules should state at what point in the business process the data may be created, what other data must be present, and what business function is allowed to create the data. The usage rules state for what purpose the data is allowed to be used. This usually involves validity tests that must be applied, relationships to other data that must exist, and what business processes use the data. Some data elements are derived from other information, and for each derived data element a single equation should be defined to calculate the quantity consistently.
Setting Up Repeatable Processes
A successful Data Stewardship implementation requires a set of repeatable data management processes. These processes not only lead to better and more consistent management of data, but help to eliminate confusion on what comes next, who is responsible for a process step, and how to achieve the Data Stewardship deliverables.
Data Stewardship processes and procedures are created incrementally as the stewardship effort matures and takes on more responsibilities. Data Stewards must have significant input into the processes if they are going to be expected to follow them. The processes must be well documented and available for reference. Even better, the processes should drive workflows in a tool to ensure that all the steps are followed and the proper approvals are obtained.
Data Stewardship Logistics
Data Stewards, especially Business Data Stewards, need to work closely with the DGPO. To make sure Data Stewards are not overwhelmed with minutiae and that their time is respected, careful planning is necessary. First of all, frequent meetings should be avoided if it is possible to obtain the steward’s input in other ways. Alternatives can include collaborative websites, emails, and smaller workgroup meetings with limited audiences. These alternatives are more convenient and usually sufficient for purposes such as gathering input on metadata and rules, as well as obtaining feedback. Regular meetings should be limited to developments related to the overall effort, training, review of improved processes and procedures, and planning for major enterprise efforts.
Business Data Stewards work closely with Project Data Stewards as well. Project Data Stewards bring issues and items that need decisions to the Business Data Stewards for resolution, and then carry back those answers to the project team the Project Data Steward is working with. Project Data Stewards must be careful to not overwhelm the Business Data Stewards. They should collect as much information about the data elements in question as possible, examine (and possibly profile) the data for data quality questions, and collaborate with other Project Data Stewards so that questions about a data element are only asked once.
Working with an Issue Log
The issue log is a key piece of the Data Stewardship program because it helps to manage, document, and report on the issues. Anyone interested in the status of a particular issue can look it up, and the people working on an issue can provide updates and add information as needed. The issue log also provides a way to prioritize the work and assign resources.
The issue log must contain sufficient fields to fully document an issue and manage the status. These fields include an issue description, results of analysis, the issue resolution, priority, what other things (projects, data, data issues) the issue impacts, who requested the issue to be opened, which business function owns the issues, and a variety of roles, such as the reviewer and assignee. If the issue violates a Data Governance rule, policy, or procedure, this should be documented as well.
Managing the issue log requires using a set of well-defined processes. The first step is to log the issue, including a description, who reported it, the priority, and why the issue is important. The next step is to research the issue to validate it and locate the root cause. Proposing a solution comes next, and should state what the impacts of the solution are, including impacts to the information chain, systems, ETL, and Data Governance tools. If the proposed solution requires an expenditure of money or resources, it may be necessary to escalate it to the Data Governance Board for approval and to assign the necessary resources. Finally, the parties impacted by the solution need to be informed of how the issue will be resolved.
Documenting and Communicating Decisions
A lot of decisions are made as part of Data Stewardship, and it is very important to document the decisions and make them easily available to all interested parties. That is, people need to know what information exists, how to ask for it, and where to find it. Part of the overall Data Governance deliverables includes a communications plan, and the plan should be built early in the Data Governance program. The communications plan should include a title and description of each communication, the intended audience and communication medium, the frequency, and which role within Data Governance is responsible for creating and presenting the communication.
Data Stewardship should also use established company publications to get the word out on what they are up to.
Data Stewardship Tools
There are four key Data Stewardship tools in addition to the issue log. The first is the Data Stewardship portal. This tool is a web portal that exposes the work done by Data Stewardship to those in the organization who need access to it. Status reports, policies, procedures, contact information, general stewardship information, and links to other tools are just some of what should be included in the Data Stewardship portal.
The Data Stewardship Wiki provides a reference to specialized terms, forms, and processes. As each new piece of Data Stewardship collateral is defined and built, it should appear in the Data Stewardship Wiki.
The business glossary contains business metadata (e.g., definitions, creation and usage rules, business function ownership, security classification, and data quality requirements) for the business terms that are being governed. The business glossary may also manage workflow for defining and approving the terms. A robust search function makes it easier to identify when a data element has already been entered into the glossary (called rationalizing), resolving naming inconsistencies in the process. The business glossary must be easily available to anyone who uses data. It should also enable building a hierarchy (or taxonomy) of the terms, which enables terms to be classified and have relationships between terms established and documented.
A metadata repository is the source system for physical and technical metadata. This can include data models, database structures, business intelligence tool metadata, ETL, and file structures. In conjunction with a business glossary, the complete set of metadata can be stored, including the links between business data elements and their physical counterparts. A metadata repository enables impact analysis and the tracing of lineage through the information chain.
Roles of Data Stewards in Data Management
Data Stewards have extremely important roles in many of the processes involved in managing data. These processes include improving data quality, managing reference data, identity resolution, survivorship, exception handling, information security, quality assurance support, and metadata management. Without involvement from Data Stewards, these processes are often incomplete and only partially (if at all) successful. The roles of Data Stewards in these critical processes are covered in Chapter 7.
Improving Data Quality
Data Stewards play a critical role in improving data quality, which is one of the most important, visible, and impactful tasks they have. Data quality is measured against a set of requirements, or data quality rules. The rules—and the required level of data quality necessary for business processes—need input and recommendations from the Data Stewards. A data quality rule comprises two parts: the business statement (which explains what quality means in business terms) and the specification (the definition of quality at the physical database level). The specification has to be specific to a system, table (or file), and column (or field). That is because the specification can be different from one system to another even when the business statement is the same. The data quality rules may be for single columns, validation across multiple columns, cross-table validation (rules that span related tables), and content validation across multiple columns.
Business Data Stewards need to make recommendations on how to handle poor-quality data, and who needs to be notified when poor-quality data is detected. Technical Data Stewards help figure out the root causes for poor-quality data as well. In some cases the root cause is technical—changes made to a system or an ETL job corrupt the data. In many cases, though, the root cause is a business issue, such as data producers being incented to be fast (but not accurate), data manipulation done by data users who don’t understand the data, or by a lack of accountability—that is, no one is being held responsible for detecting poor-quality data and reporting it to the Data Stewards. Of course, if the data quality rules are not defined (often the case in companies that lack Data Governance), you don’t have any way to know if the data is of good quality or not.
Another role that Business Data Stewards have is to determine the priority of data quality issues that need to be addressed. This involves making a business case for improving the quality, often linked to regulatory compliance or increased costs of doing business. The business case needs to establish the criteria for understanding when the data quality has been improved sufficiently to meet business needs.
Data quality is often measured in terms of dimensions, which are ways to categorize types of data quality measurements. Dimensions measure different aspects of data quality. Certain dimensions are relatively easy for Business Data Stewards to define data quality rules for, after which the conformance to those rules can be measured by profiling the data. These dimensions include completeness, uniqueness, validity, reasonableness, integrity, timeliness, coverage, and accuracy.
Data quality is often measured using a data profiling tool, and Data Stewards need to analyze the results of running the tool to detect data quality issues. In Discovery mode, a tool can analyze the data and propose what it detects to be possible issues. However, Business Data Stewards must review the results to determine (and record) what the actual issues are, and which are false positives. In Assertion Testing mode, data quality rules are fed into the tool and the tool then returns how well the data fits the rule. Business Data Stewards not only need to provide a rule to test, but analyze the results to see how well the data fit the rule. In many cases, the results highlight where a data quality rule is wrong or incomplete, at which point the rule must be revised and tested again. When a rule is violated, the Business Data Stewards must decide whether the violation is worth remediating. The violating data may no longer be used, may be old and no longer valuable, may come from inactive records, or the number of outliers may be so small that the impact is negligible.
Business Data Stewards need to balance possible courses of remediation against the business impact of the issue: a cost–benefit analysis. In addition, Business Data Stewards must decide where the potential for harm from data quality rule violation is significant enough that automated detection of the violation is worthwhile. Automated detection involves not only stating the rule in a machine-readable form (usually IT is involved in this step) and having a tool that can apply the rule and detect and document the violations, but deciding what to do with the errors when they are discovered.
Managing Reference Data
Reference data refers to sets of values or classifications (e.g., hierarchies) of values. Code lists, state abbreviations, charts of accounts, and a product hierarchy are all examples of reference data. Reference data can be thought of as having two parts: a code (such as “M” or “F”) and a description (“male” or “female”). Business Data Stewards play an important role in ensuring that the description is accurate and represents a definition of the associated code. If the description is insufficiently rigorous, data users will often misinterpret what the code means and use the data incorrectly. In addition, inadequate descriptions of codes make it very difficult to map equivalent codes between systems (called harmonizing).
Business Data Stewards must document existing values to ensure they are well understood, as well as evaluate the need for proposed new values. They must ensure that proposed new values don’t overlap existing values, don’t duplicate information recorded elsewhere, and are consistent with the meaning of the valid value field. If the new value is approved, stakeholders must be informed, and the new value must be documented. In addition, Business Data Stewards must detect and act when new values are used that have not been authorized.
Business Data Stewards are key participants in efforts to map code/description pairs between systems, taking into account that the data elements may not have the same name from one system to another, or, alternatively, may have a different meaning but the same name. Business Data Stewards must also determine when data elements in different systems mix several different elements together, causing complexities in the mappings.
Identity Resolution for Master Data Management
Identity resolution is the process of resolving multiple instances of data that represent the same entity into a single record for that entity. It is one of the most crucial processes in master data management. Business Data Stewards and Technical Data Stewards must collaborate to figure out which systems contain records that must be evaluated when creating the “single version of the truth” (sometimes called the golden copy). They must also determine what metadata that describes the entity is present in each system, and the quality of that metadata (usually data profiling is necessary).
Business Data Stewards have a number of important decisions to make during the identity resolution process. First of all, they need to help find and evaluate the quality of the identifying attributes—the attributes of the entity that enable you to tell the entity apart from all other entities and establish that entity’s existence in multiple source systems. To figure out the identifying attributes, Business Data Stewards need to establish the minimum set of high-quality potential identifying attributes that exist across the source systems, then evaluate those attributes to see if the identified attributes can serve the purpose. Tools can help with this, but Business Data Stewards often understand things about the data that tools do not, such as subtle data quality issues that may disqualify an attribute from use.
Another important input from Business Data Stewards is to determine the sensitivity of the determination that two records refer to the same entity (sensitivity to a false positive) or that two records do not refer to the same entity (sensitivity to a false negative). Determining the sensitivity is a balancing act: better quality attributes and more cleansing and standardization is usually needed to achieve a higher sensitivity. Whether the extra time and effort is justified depends on the business impacts of false negatives and false positives.
Business Data Stewards can also make recommendations on sources from which the identifying attributes can be enriched, corrected, or cleansed. Using alternate data sources (e.g., purchased data) may improve the matching accuracy considerably, but care must be taken to use reliable data.
Master Data Management Survivorship
Survivorship refers to resolving what information to keep in the golden copy when two records representing the same entity contain conflicting information. For example, two records for the same person might have different values in the “birthdate” field. Business Data Stewards must create a set of rules that can be enforced by the master data management engine (or hub) to determine which value to keep. The rules take into consideration factors such as the record date (which is the most recent record), which source is most trusted, and known data quality issues (you may know that one system has issues with the accuracy of birthdates). Once the business rules are established, they are used by the master data management tool to resolve attribute values in the survivorship process. The tool identifies the master records from multiple sources that have conflicting attribute values, applies the resolution rules, and creates (or updates) the golden copy with the selected value. The rules will need to be periodically revisited if they generate too many processing exceptions.
Master Data Management Exception Handling
The master data management hub performs a large number of processes on incoming records, including cleansing, matching to existing records, inserting new records, updating existing records, merging records that represent the same entity, unmerging records that were merged in error, and deleting obsolete records. These processes follow a set of rules based on the “known” traits of the incoming data, usually established by profiling the data ahead of time. However, if the traits of the data (e.g., the length, data type, lookup value, format, or relationship to other records) vary from what is expected, it may lead to processing exception—that is, conditions under which the master data management hub cannot process the record. Business Data Stewards (with an assist from IT) are heavily involved in investigating and remediating these exceptions. This involvement includes conducting root-cause analysis and making recommendations on what steps need to be taken to fix the problem. The steps can include changes to a source system, additional data cleansing, updating the exception rules, and providing instructions to data input groups on changes they need to make to how they carry out their jobs. In the first three cases, IT will need to make the changes.
Assigning the correct information security classification to data is critical to ensuring that it is treated and protected correctly. Although legislation and regulations may specify the classification for some data elements, knowledgeable business people—Business Data Stewards, in other words—who understand what the data is and what it means, must make decisions about how to classify data appropriately. Business Data Stewards are also often charged with coming up with a process to obtain and document the proper permissions to access protected data.
Metadata management is another important role for Data Stewards. As discussed earlier, Business Data Stewards identify key business data elements and provide business metadata such as definitions and creation and usage business rules. In conjunction with Technical Data Stewards, Business Data Stewards also identify and document the physical data elements that correspond to the business data elements. This step is necessary for carrying out data profiling.
Business Data Stewards also make recommendations on customizations and additional functionality for their tools, such as the business glossary, metadata repository, and web portal.
Part of the quality assurance phase of any project is to test whether the application meets the business requirements that were laid out. The test cases run by the quality assurance analysts should detect issues and defects that can be prioritized for correction. But quality assurance frequently leaves out two important components of the requirements: the data requirements and the metadata requirements. Test cases for data requirements look for violations of stated data quality rules, such as invalid values from fields that have a valid value set; violations of range, pattern, data type or distribution; missing values in mandatory fields; and enabling the creation of a record when mandatory data is missing. Test cases for metadata requirements include ensuring that the user interface shows expected values based on the field definitions, correctly derived values based on the stated derivation rule, and correctly named field labels.
Data Stewardship Metrics
Measuring the progress you are making in the Data Stewardship effort is important because it allows you to show the advances you’ve made in various categories. There are two major categories of metrics: business results metrics and operational metrics. These are discussed in Chapter 8. Business results metrics measure the effectiveness of Data Stewardship in supporting the data program and adding value to the company through better data management. Operational metrics measure the acceptance of the Data Stewardship program and how effectively the Data Stewards are performing their duties.
Business Results Metrics
Business results metrics can only be measured if the company is willing to take a long-term view of value because the value that Data Stewardship adds can take a while to be felt. The business must also be willing to attribute improvements in data management to the Data Stewardship effort. Business value includes increased revenue and profits, increased productivity, reduced application development costs, and reduced compliance issues. Many of these items are difficult to measure directly, so it may be necessary to survey the data users. The survey can ask about the number of customer complaints, regulatory compliance issues, consistency of data definitions, decreased costs, and reduced time spent cleansing data.
Operational metrics measure the level of participation of the business in Data Stewardship, the level of importance given to the Data Stewardship effort, and how often and how effectively the Data Stewardship deliverables are used. Data Stewardship produces metadata, and another measure of operational success is how often the stewardship metadata has been accessed. Other metrics can include changes in the maturity level, consolidation of disparate data sources, the number of data elements brought under governance, and how many business functions have provided assigned stewards. The business function may also have added Data Stewardship performance to the compensation plan, and encouraged being active in meetings.
Data elements go through a number of statuses in reaching the fully governed state, and you can measure and graph the time needed to transition from one state to another (e.g., from owned to approved). You can also measure how many data quality issues have been resolved, and how long it took to resolve them.
Data Stewardship Maturity
As with most other long-term efforts, Data Stewardship grows and matures over time. It is a good idea to measure the changing level of maturity to show progress. Chapter 9 presents a simple Data Stewardship maturity model consisting of five levels: Initial, Tactical, Well Defined, Strategic, and Optimized. Each level is defined in terms of response to data issues, attitude of management, handling of metadata, and development of formal organization and structure. With each increasing level, the categories grow more mature, culminating in a robust, innovative approach, where the collection and handling of metadata is a normal part of doing business and the organization and structure not only support the internal needs of Data Stewardship, but have expanded to incorporate outside business partners, and the company is recognized as an example of good Data Governance and Data Stewardship in the global business community.
Within each level of Data Stewardship maturity, there are four dimensions of Data Stewardship. Organizational awareness rates how well Data Stewardship is integrated into the organization, sponsorship, and the development of metrics. Roles and structures rates how well defined the Data Stewardship roles are, as well as how effectively those roles are being staffed and executed. It also rates the completeness and integration of supporting structures. Standards, policies, and processes rates how well defined the framework for supporting policies, process, practices, and standards are. It also measures how much executive support and endorsement exists for the policies. Value creation rates the recognition of the increasing value of data, as well as the recognition within the organization of the value of Data Stewardship.
You can measure your advancing Data Stewardship maturity by laying out the levels and dimensions in a grid, and rating the level in each dimension periodically. As the maturity increases, the level of maturity should advance from left to right in each of the dimensions.
Data Stewardship is the operational part of Data Governance, and without Data Stewardship the best that can be hoped for in Data Governance is theory and perhaps some policies. Data Stewardship and the Data Stewards are what make Data Governance a reality, with processes, procedures, and data being understood, governed, and improved in quality, and in general being treated like the enterprise asset it should be. There are many different kinds of Data Stewards, such as Business Data Stewards, Technical Data Stewards, Project Data Stewards, Domain Data Stewards, and Operational Data Stewards. Each has a role to play, and Data Stewards need to work together to achieve the desired results.
An organization with a robust and mature Data Stewardship effort can make decisions and answer questions about data efficiently and accurately, as well as being much more successful in improving data quality, executing on master data management, managing metadata, having more successful projects with more predictable timelines, and having the business take responsibility and accountability for the data they own. With successful Data Stewardship, champions of proper data management will emerge across the enterprise, from frontline data input specialists to data analysts to developers, spreading into management and to the executive levels, as company management sees how accountable and responsible data management leads to better business decisions, better customer satisfaction, and better business opportunities.
Good luck with your efforts, and I hope this book has been useful in your journey, at whatever stage you are in.