Introduction - Data Stewardship (2014)

Data Stewardship (2014)

Introduction

Introduces basic concepts of Data Stewardship, summarizes what you can do with governed data, what the book covers, and who the audience is for the book.

Keywords

managing data, audience, chapters, summary

Companies are getting serious about managing their data, including improving data quality, understanding the meaning of the data, leveraging the data for competitive advantage, and treating data as the enterprise asset it should be. But doing a proper job of managing data requires accountability—that is, business functions must take responsibility for the data they own and use. The formal recognition of the need to have a structure, organization, and resources in place to manage data—and the actual implementation of that need—has come to be known as Data Governance. Within the umbrella of Data Governance is Data Stewardship. Various kinds of Data Stewards (detailed later in the book) work closely with the data as well as other subject-matter experts and stakeholders to achieve the goals and deliverables laid out by the Data Governance effort. The Data Stewardship efforts are managed and coordinated by the Data Governance Program Office and supported by high-ranking officers of the company. This book provides usable and actionable information and instructions on how to set up and run a Data Stewardship effort within Data Governance. The book is designed to provide everything a new Data Steward or Data Governance Manager needs to be effective and efficient when it comes to Data Stewardship. It also provides the details needed by those taking on the responsibilities of a Data Steward.

Problem Statement

There are many challenges that must be faced when working with data, including:

- Data doesn’t explain itself. Someone must provide an interpretation of the data, including what it means, how to use it properly, and how to evaluate whether the data is of good quality or not.

- Data is shared and used by many, for many different purposes. So who owns it? Who makes decisions about it and is responsible when the data goes “wrong”?

- Many processes that use data depend on people upstream of the process to “get it right.” But who says what “right” is? And who determines when it goes “wrong”?

- The software development life cycles require many handoffs between requirements, analysis, design, construction, and data use. There are lots of places where the handoffs can corrupt the data and endanger the data quality.

- Technical people tasked with data implementation are not familiar with the data’s meaning or how it is used.

- Those of us in the data community have a long history and habits around tolerance of ambiguity both in data meaning and data content.

All of these factors lead to a poor understanding of the data and a perception of poor quality, with little ability to know the difference. The answer to these challenges is that data needs to be actively and efficiently managed. Keep in mind too that the rather haphazard “methodology” used by many companies to collect metadata does not represent true or effective data management. Some of the failures include:

- Data definitions: These are often written in haste by project staff (if they are written at all), and the definitions are not rationalized across the enterprise, leading to multiple definitions of the same term, often under different data element names.

- Data quality: The data quality rules are often not defined and the quality itself is rarely measured. Even when the rules are defined, the context for the rule (the data usage for which the rules applies) is often ignored. All of this leads to confusion about what data quality is required, as well as what data quality is achieved.

- Documentation: Documents containing metadata are rarely officially published, and are often lost, tucked away on shelves or in archived files. The documentation is not widely and easily available, nor is there a robust search engine so that interested users can find what they need.

- Creation and usage business rules: There is often a lack of understanding about the conditions under which an entity (such as customer or product) can or should be created, as well as how data should be used. This lack of understanding leads to incomplete or inaccurate information being collected about the entity, as well as data being used for purposes for which it was never intended. The end result is that business decisions based on the data may lead to suboptimal results.

Formal enterprise-wide Data Stewardship, as part of a Data Governance effort, is crucial in managing data and achieving solutions to these challenges. With Data Stewardship, the organization can begin treating data as an asset. Like other assets, the data needs to be inventoried, owned, used wisely, and understood. This requires different techniques with data than with physical assets, but the need is the same. With the data asset, inventorying and understanding the data takes the form of a formally published business glossary, often in conjunction with a metadata repository. Establishing ownership requires understanding how the data is collected and who uses it, then determining who can best be responsible for the content and quality of the data elements. Finally, ensuring that data is used wisely means understanding and managing how the data is created, for what purpose the data was created, and whether it is suitable for use in new situations that may arise—or even in the situations for which the data is being used currently.

Roles of the Data Steward in Managed Data

Properly managed data enables major enterprise efforts such as those listed next to be successful with fewer missteps and less wasted effort. The Data Stewards play important roles in each of these (and many other) efforts. They determine the following:

- In data warehousing:

image What dimensions are needed and what they mean.

image What facts are needed and what dimensions they depend on.

image How to define the facts and the derivation and aggregation rules.

image Where different terms proposed for dimensions or facts are really the same thing.

image Who must take responsibility for the data elements that make up the dimensions and the facts.

image How data will be transformed to use it in the data warehouse.

- In master data management:

image What master data entities (customer, product, vendor, etc.) should handled, in what order, and what the entities mean (e.g., What is a customer?).

image What identifying attributes are needed (with good quality) to implement identity resolution.

image What is the sensitivity for determining uniqueness of identity (sensitivity to false positives and false negatives).

image What are the appropriate reference data values for enumerated attributes, and how to derive the values from the available data.

- In data quality improvement:

image What level of data quality is needed for a given purpose.

image What data should be profiled to rigorously examine the values.

image What constitutes “expected” values for the data. These expectations can include range, specific values, data type, data distribution, patterns, and relationships.

image What possibilities exist for the root causes of poor-quality data.

image What requirements must be given to IT to fix root causes and/or cleanse the data.

- In system development: The Data Stewards play a pivotal role in ensuring:

image That data used by a system is well defined and that the definition and business rules meet enterprise standards. If the definitions or rules are missing or of low quality, the Data Stewards are instrumental in providing high-quality definitions and rules.

image That data models meet enterprise standards as well as project requirements.

image That the requirements for managing data as an asset don’t get ignored because of project schedule.

These topics and the roles that Data Stewards play are discussed in much more detail later in this book.

What this Book Covers

This book is broken up into 10 chapters, with each chapter tightly focused on an aspect of Data Stewardship. The chapters are:

- Chapter 1: Data Stewardship and Data Governance: How They Fit Together. This opening chapter discusses Data Governance program deliverables, the roles and responsibilities of program participants (including Data Stewards), and how Data Stewardship fits into a Data Governance program.

- Chapter 2: Understanding the Types of Data Stewardship. This chapter describes each type of Data Steward. It also discusses the type of person needed in the role, and how the various types of Data Stewards are chosen and assigned.

- Chapter 3: Data Stewardship Roles and Responsibilities. This chapter provides a detailed list of the responsibilities for each type of Data Steward. It also describes how the stewards work together in a Data Stewardship Council, and the role of the Enterprise Data Steward, who manages the stewardship efforts on behalf of Data Governance.

- Chapter 4: Implementing Data Stewardship. This chapter describes how to kick off the Data Stewardship effort. It describes how you gain support, ascertain the structure of the organization, identify the types of Data Stewards who will be needed, figure out how information flows through the organization, determine what documentation you have already, and decide what tools you’ll need and what you already have. It also describes how to determine what metadata is already available, such as valid value lists and data quality rules.

- Chapter 5: Training Business Data Stewards. This chapter discusses how to train Data Stewards, since most of those selected for the role will not know how to perform their duties. It discusses the lesson plan, the various categories of training, and what tools the Data Stewards will need to learn about. It also provides guidelines on how to get the most out of your training efforts.

- Chapter 6: Practical Data Stewardship. This chapter describes the practical aspects of the main tasks and responsibilities of Data Stewardship. These include the identification of key business data elements and the collection of metadata about those elements, determining ownership, and working through the day-to-day issues with an issue log and repeatable processes. Also discussed is how the different types of stewards cooperate, and logistics like meeting schedules and working groups. The basic tools are also described.

- Chapter 7: Important Roles of Data Stewards. Data Stewards have an extremely important role in all the data management activities, but they play an especially key role in certain areas. This chapter describes how Data Stewards contribute to improving data quality, improving Metadata Quality, managing reference data, ascertaining identifying attributes (for identity resolution), resolving conflicts and sensitivity to errors in Master Data Management, managing information security, managing metadata, and supporting quality assurance.

- Chapter 8: Measuring Data Stewardship Progress: The Metrics. A Data Stewardship program requires resources and effort. This chapter shows you how to identify and measure the results you are getting from those efforts in two main areas: business results metrics (measures the effectiveness in supporting the data program) and operational metrics (measures the acceptance of the program and how well the Data Stewards are performing).

- Chapter 9: Rating Your Data Stewardship Maturity. The Data Stewardship effort should increase in maturity as you develop it. This chapter describes a maturity model with multiple levels and dimensions. The model helps you rate your maturity, as well as identify what a well-developed Stewardship program should look like.

- Chapter 10: Summing It All Up. This chapter reviews the material covered in previous chapters to provide an “in a nutshell” view of what you have learned in this book.

- Appendix A: Example Definition and Derivation. This appendix provides an example of a robust definition and derivation for a business data element.

- Appendix B: Sample Training Plan Outline. This appendix provides training plans for Technical Data Stewards and Project Managers. Other training plans are covered in Chapter 5.

What is not in this Book

Although this book talks about how Data Stewardship fits into Data Governance, it does not provide all the information needed to set up and run the wider Data Governance effort. For that, references are provided in the text.

Who Needs this Book?

This book is designed for anyone with an interest in Data Stewardship. It will be especially useful to someone charged with organizing and running a Data Stewardship effort because it is based on the actual experience of someone who has done it, and not just talked about it. Thus, the information reflects real life, not just theory. But this book will also be useful to those who will be Data Stewards, because it explains what is expected of you, tips and tricks, and how your role adds value to the company and the business function you represent. Finally, this book will be useful to those (including executives) charged with supporting Data Stewardship and Data Governance because it describes what is supposed to be going on and how progress and maturity are measured.