Data Stewardship (2014)
CHAPTER 3 Data Stewardship Roles and Responsibilities
The detailed lists of responsibilities are listed and explained for each of the main types of Data Steward: Business Data Stewards, Technical Data Stewards, Operational Data Stewards, Project Data Stewards, Domain Data Stewards, and the Enterprise Data Steward. The responsibilities of the Data Stewardship Council are also discussed.
Business Data Steward; Technical Data Steward; Operational Data Steward; Project Data Steward; Domain Data Steward; Enterprise Data Steward Responsibilities; Stewardship
Before we get into more details on how to set up and run Data Stewardship as part of a Data Governance program, we need to examine and explain the detailed tasks that the various types of Data Stewards perform as part of their participation. The tasks include not only what the Data Stewards do individually, but also the tasks they perform as a group (the Data Stewardship Council). Not only will this information let you start building job descriptions (if you’re just starting out) while you read the rest of this book, but later sections will make more sense in light of roles and responsibilities explained here.
Each type of Data Steward has a set of tasks, which can be further classified into categories.
The structure of Data Governance and Data Stewardship shown throughout this book is the recommended structure. It may not be possible to achieve the recommended structure right away due to lack of funding, personnel, or interest. However, the goal should be to move toward the recommended structure as the effort matures.
Data Stewardship Council
Although Business Data Stewards have many responsibilities as individuals (discussed later in this chapter), they also form the Data Stewardship Council, and have responsibilities as a group.
What is the Data Stewardship Council?
The recommended overall structure of the Data Stewardship Council is shown in Figure 1.4 (within the dotted-line box). Figure 3.1 shows a sample (in this case for an insurance company) of how the Data Stewardship Council might look, with each box representing a business function (and a Business Data Steward representing that function).
FIGURE 3.1 A sample structure for the Data Stewardship Council.
There are a couple of things to notice about the structure of the Data Stewardship Council. First, the council is led by the Enterprise Data Steward, a role that may be initially filled by the Data Governance Manager, but ideally should be staffed by a dedicated person in the Data Governance Program Office (DGPO). Note also that the business function (e.g., insurance services) may be further broken down into component parts, such as actuarial, claims, and underwriting operations. This may be necessary to designate Business Data Stewards who can manage the key data for that component of the business. In this case (a real example), there was no single Business Data Steward who could be responsible for all the insurance data, so it had to be broken down as shown in the figure.
Put another way, the assignment of Business Data Stewards depends on the structure of your business and the complexity of the data you need to manage. For example, a major healthcare provider has business functions around membership and eligibility. However, because they sell insurance to widely varying audiences (e.g., to commercial corporations versus through government programs), different Business Data Stewards have to be assigned to membership/eligibility for commercial programs than for government programs. This is necessary because the data is very different between the two types of programs despite the fact that the business functions (membership/eligibility) seem to be similar.
Responsibilities of the Data Stewardship Council
Much of the work noted as the responsibilities of the Data Stewardship Council is carried out by the members of the council, but are listed here as part of the council responsibilities because they require discussion and reaching consensus with other Stewards due to the impact to the organization outside the owning business function.
The responsibilities of the Data Stewardship Council include the following:
- Focus on ways to improve how an organization obtains, manages, leverages, and gets value out of its data. Data is a valued asset, and the Data Stewardship Council should be looking for ways to improve the quality of the asset and use the data for competitive advantage.
- Be the advisory body for enterprise-level data standards, guiding principles, and policies. The standards and principles set guidelines that the Business Data Stewards (and others) need to use, so the council has to have a say in drafting them. The council should have approval rights for the standards, and manage any proposed changes to those standards. Recommending what policies are needed and what they need to say is also an important task because the Business Data Stewards are on the frontlines, and thus in a great position to see what policies are needed to make Data Governance a success.
- Mediate or arbitrate the resolution of issues. The Data Stewardship Council has to work together as a team to settle any data issues that arise. There can be many of these: disagreements over meaning or rules, differing requirements for data quality, modifications to how data is used, and which business functions should own key data elements.
- Communicate decisions of the Data Stewardship Council and Data Governance Board. It doesn’t make a whole lot of sense to institute Data Governance and Data Stewardship unless the decisions made about the data are communicated to the people who use it. For example, if the council determines that currently certain data is not of sufficiently high quality to be suitable for a particular use, then the council needs to identify those people using it that way and let them know that it is not appropriate.
It is common for the people using the data to tell the Data Stewards that the data is not of sufficient quality. These issues must be logged and evaluated as part of the regular Data Steward duties.
- Ensure alignment of the Data Governance effort to the business. It is all too easy to start designing Data Governance protocols (including processes and procedures) that are out of alignment with what is important to the business. Make no mistake—doing so is a recipe for failure. If Data Governance is perceived as being a roadblock, out of synch with the business priorities, or simply irrelevant, the effort will be quickly dismantled and the resources put to more effective use.
- Participate in and contribute to Data Governance processes. The day-to-day processes for executing on Data Governance and Data Stewardship are a little different in each company. The council (as a group) needs to define and design the processes since they are the people who will be expected to follow them. They will also be expected to provide feedback on the processes to determine which ones are working and which ones need to be changed or discarded. These processes—and the tuning of the processes—are a key part of the design and implementation of Data Governance. It is pure folly to try and build the processes without the input of the Data Stewardship Council.
- Communicate the Data Governance vision and objectives across the organization. In most organizations, the vision and objectives of Data Governance are new to the company, and few, if any, of the employees understand what is required. The Data Stewardship Council has to communicate the vision and objectives, especially into the business functions they represent. This constant communication is one of the most important things that the council is responsible for.
- Communicate the rules for using data. Rigorous business rules around data are new to most employees, yet using the data according to the rules is everyone’s responsibility.
- Review and evaluate Data Governance performance and effectiveness. The Data Stewards need to “buy in” to the performance and effectiveness measures, much as employees should have a say in how their performance is measured in other ways. The best way to have effective Data Stewards is for them to want to participate, and that requires that they be part of the measurement process.
- Provide input into Data Governance goals and scorecard development. The Data Governance goals must align with performance and effectiveness measures (the previous bullet point), so just as with those measures, the Data Stewards should have a voice in what the Data Governance effort hopes to accomplish (goals) and how progress is displayed to management (scorecard).
- Collaborate with other interested parties in the management of definitions, policies, procedures, and data issues. The Data Stewardship Council provides a forum for the Data Stewards to discuss and reach agreement (or at least consensus) on many topics. Key among these items are:
Definitions: Many people (commonly known as stakeholders) have an interest in how terms are defined, and it is especially important that the stakeholders have a common understanding of the data names and definitions. Managing definitions requires soliciting input from stakeholders during both the initial definition phase and for any changes to the definitions that are proposed.
Policies and procedures: Policies drive what must be accomplished; procedures say how the accomplishments will be met. Data Stewards must have input into policies and procedures, as they are eventually responsible for meeting many of the goals and executing on the established procedures. In addition, Data Stewards (who are knowledgeable about the data and care about the data) are the very people who are best able to suggest what policies are needed in the first place. If you just “hand over” policies and procedures to the Data Stewards without getting their input, it will be more difficult to get them to cooperate.
Data quality issues: Managing issues with the data and data quality is a key job of the Data Stewardship Council. The impacts of the issues must be assessed, proposed remediation options must be developed, the impacts of the various remediation options must also assessed, and priorities must be established. This work is best done by members of the council and identified stakeholders and with a well-defined workflow. Establishing the workflow for resolving issues requires discussion and agreement within the council.
- Enforces use of agreed-on business terminology. When data users use different terminology to represent the same concept, confusion reigns supreme. As business terms are named, given a robust definition, and have business rules defined, the terms should be used consistently across the organization and synonyms should be actively discouraged by the Business Data Stewards. That is, when a synonym is used in conversations, discussions, and documentation, the Business Data Stewards should insist that the correct term be substituted. This is especially important when the incorrectly used term actually has been defined to mean something different (e.g., customer vs. client vs. account). This item is considered part of the duties of the council (as opposed to individual Business Data Stewards) because issues within a business function can be referred to the steward for that function.
Enterprise Data Steward
The Enterprise Data Steward is charged with running the day-to-day Stewardship effort, and these responsibilities can be broken up into three major categories: leadership, program management, and measurement.
The leadership responsibilities include:
- Report up to the Data Governance Manager. The Enterprise Data Steward is part of the DGPO, and as such is part of the Data Governance Manager’s staff.
- Lead the community of Data Stewards. The primary responsibility of the Enterprise Data Steward is to provide leadership for the Data Stewards. Although they don’t report functionally to the Enterprise Data Steward, they take direction for all Data Governance–related activities from the Enterprise Data Steward.
- Liaise with Data Governors/business leads or their appointees as well as IT project leads to implement and maintain Data Governance. The Enterprise Data Steward works closely with the Data Governors (members of the Data Governance Board) as needed to guide the Data Stewards.
- Work with the Data Governance Manager to develop the Data Governance vision and framework, short term and long term. Working closely with the Data Stewards, the Enterprise Data Steward is in a unique position to understand and drive the framework that Data Stewards work within.
- Identify and initiate projects to implement the vision. Although Data Stewards (working together as the Data Stewardship Council) may propose projects to fix specific data issues, it is up to the Enterprise Data Steward to propose and initiate projects that drive forward the vision of Data Governance. These projects may include instituting an overall workflow to handle data quality rule violations, collaborating with the screen designers to create standards for data entry (which improves input quality), and working with IT to install tools that make the business data and metadata available to the enterprise as a whole.
- Ensure all Data Governance work efforts are in line with overall business objectives and the Data Governance vision. This primarily means focusing the efforts of the Data Stewards and DGPO staff on projects and efforts that are of highest importance to the enterprise. For example, if new metrics are proposed by the company executives, the Enterprise Data Steward may need to have the Data Stewards work immediately on getting those metrics defined, temporarily stopping work on other data elements or data issues they were working on.
- Manage the Data Domain Stewards (see explanation later in this chapter) or, in the absence of Data Domain Stewards, take on their responsibilities to the extent possible.
- Define prioritization criteria. Data Stewards are responsible for reaching agreement on the priorities of competing projects that correct data issues. However, a standardized set of prioritization criteria is needed so that ranking is not done according to “who shouts the loudest.” The Enterprise Data Steward is responsible for proposing and gaining agreement on these criteria, which may include reduced cost, increased income, probability of regulatory violations, inability to achieve business goals, and deterioration of data quality.
- Provide direction to business and IT teams. The Enterprise Data Steward is (or should be) aware of what is going on and what is needed to drive Data Stewardship forward. Ideally, he or she should be the single point of contact for teams that need to interact with Stewardship either to get information needed or to provide the services needed.
- Lead implementation of Data Stewardship organization. As organizational changes need to be made, including identifying the need for additional or replacement stewards, the Enterprise Data Steward takes the lead on identifying what changes are needed as well as implementing those changes.
The program management responsibilities include:
- Design the processes and procedures for Data Stewardship. The Enterprise Data Steward needs to collect specifications on how data should be managed from the Data Stewards and formulate them into a set of processes and procedures that the Stewards can use. Changes to those processes and procedures are unavoidable, as steps that seemed to be a good idea initially may not work out as intended. The Enterprise Data Steward needs to make the necessary changes, get the buy-in from the Stewards, and republish as necessary.
- Build and drive the agenda for Data Stewardship Council meetings. The Enterprise Data Steward is responsible for collecting issues that need to be discussed, status updates, and other agenda items; scheduling the council meetings; and running the meetings themselves. Publishing the meeting minutes is part of this as well.
- Maintain a repository of information and decisions. The Enterprise Data Steward must ensure that the work product (documentation) produced by the Data Stewards is completed and published. The documentation can include:
Definitions, data quality requirements, and business rules, which must be put into the business glossary.
New processes and procedures that are published and listed on the Data Governance/Data Stewardship website.
Presentations for information and training, published on the Data Governance/Data Stewardship website.
Rules for engaging Data Stewardship on data issues.
- Improve overall enterprise data quality and reliability through process improvement. This includes:
Develop and institute information life-cycle processes. Carefully planned, documented, and executed information life-cycle processes protect the quality of the data by ensuring that it is properly and completely captured and that the business rules for the data are consistently applied.
Improve the processes that capture and process the data. Although the focus of Data Stewardship is on data, poor-quality data is often a symptom of broken processes. The Enterprise Data Steward is in a unique position to see the “big picture” on data quality problems, and work with the Data Stewards to look for opportunities to proactively improve the data, as well as identify the process issues and remediate them.
According to Danette McGilvray (Executing Data Quality Projects, Morgan Kaufmann Publishers, 2008, p. 24), the information life cycle is the “processes required to manage the information resource throughout its life. The six high-level phases in the life cycle are referred to as POSMAD: Plan, Obtain, Store and Share, Maintain, Apply, Dispose.”
- Review and manage issues, and meet with the business to understand user needs and technical feasibility. Issue management includes keeping on top of the issue log, and making sure that issues are properly documented, impacts assessed, and priorities established. With this information, the Enterprise Data Steward can meet with the impacted groups and Data Stewards to work out a remediation plan.
- Work with Data Governors and Data Stewards to facilitate the issue resolution process. Act as point of resolution prior to Data Governance Executive Steering Committee involvement.
- Provide counsel to projects to ensure the project is in line with the vision of the Data Governance program. Projects (especially major ones) can have a significant impact on the goals of Data Governance. For example, without guidance from Data Governance, critical data elements may be misdefined or misused, and the proper representatives and tasks from Data Governance may not be involved. The Enterprise Data Steward can educate the project managers on what is needed and provide resources (in the form of Project Data Stewards, described later in this chapter) to the project to ensure that the goals of Data Governance are safeguarded.
The measurement responsibilities include:
- Define, implement, and manage Data Governance metrics. Data Stewards have some responsibility in helping to define the metrics, but the Enterprise Data Steward takes “point” on getting the metrics defined and implemented, as well as actually running the measures to generate the scorecards (the next bullet).
- Track, monitor, and publish Data Governance scorecards. The scorecards are generated periodically and provide information on how Data Governance is doing in achieving its overall goals, as well as participation (see Chapter 8).
Business Data Stewards
The Business Data Steward’s responsibilities can be broken up into three major categories: business alignment, data life-cycle management, and data quality and risks.
The business alignment responsibilities include:
- Member of the Data Stewardship Council. As described previously, the council is the organization within which Data Stewards work together.
- Aligned to a business function. Business Data Stewards represent a business function that is different from a portion of the organization. For example, in an insurance company, there might be Business Data Stewards for claims, actuarial, and underwriting, even though there might not be a portion of the organization called “claims,” etc. As we will discuss in Chapter 4, organizing Business Data Stewards this way makes them relatively unaffected by reorganizations.
- Responsible for Data Governance execution in their functional area. Each Business Data Steward represents (along with the Data Governor) Data Governance and Data Stewardship in their business function. The steward provides visibility into what is going on in the business function, and must actively ensure that policies, processes, and procedures are followed, escalating a situation to the Enterprise Data Steward if there is an issue.
- Identify key business terms, data quality requirements, and data quality metrics. Since it isn’t possible to bring all the data under governance at once, Business Data Stewards must identify the most important data (key business terms). For these key terms, Business Data Stewards must then document the data quality requirements, as well as how the quality will be measured (the data quality metrics).
- Provide input for Data Governance metrics. As stated earlier, the Enterprise Data Steward is responsible for putting together the Data Governance metrics, but Business Data Stewards provide input to that process. As with most of other types of metrics, it is difficult to expect Business Data Stewards to “buy in” to the metrics if they had no say in what the metrics consist of.
- Represent interests of Data Governors. When the Data Governors have issues or concerns about the data, it is the responsibility of the Business Data Steward to review those items with the other members of the Data Stewardship Council to get feedback and make recommendations.
- Work with Data Governors to ensure business has a practical understanding of the data. It is important for all the data analysts and other data users to understand the data, what it means, and what business rules it must follow. This understanding will help the analysts use the data properly, as well as spot potential issues early and bring them to the attention of the Business Data Steward. Under these circumstances, the data analyst and data user population become the “eyes and ears” of the Business Data Stewards, leading to a more effective governing and stewarding of the data.
- Participate in process and standards definition. Business Data Stewards are in a good position to define efficient processes and standards that are not too onerous to follow. Since they will need to adhere to the processes and standards, it is important to get their input into the process definition. The Enterprise Data Steward then takes this input and creates the processes and standards.
- Ensure that data decisions are communicated and business users understand impacts of the decisions to their lines of business. It is very important that decisions about the data—and their impacts—are communicated to the people who use the data. The quality of the data and the quality of business decisions can seriously deteriorate if the decisions about the data are not communicated to the data users. For example, if an issue is discovered with the quality of birthdates in a particular system, that needs to be communicated to people using those birthdates, perhaps with a recommendation on where they can get reliable birthdate data.
- Provide business requirements on behalf of aligned function. Business Data Stewards need to provide their business function’s requirements for quality and usage of the data so that issues can be surfaced and projects considered to ensure the data meets those requirements.
Data Life-Cycle Management
The data life-cycle management responsibilities include:
- Facilitate the Data Governors through the change control process. The Data Governors have a role in approving recommendations made by the Business Data Stewards, and thus are part of the change control process workflow. However, they often need input and guidance from the Business Data Stewards to effectively perform that role.
- Coordinate business requirements and requests specific to stewarded data area. This responsibility includes not only establishing priorities within a business function, but reviewing requirements and requests to ensure that there are no duplicates and identify where requirements and requests can reasonably be combined into a single work stream.
- Define the business data definitions and appropriate data usage. Business Data Stewards are responsible for establishing business definitions for the data they steward that meet the requirements for robust definitions established by the DGPO. Business Data Stewards can then communicate the definitions and approved data usage to the stakeholders and Data Owners.
- Own data metrics for compliance with data governance policies and standards. Business Data Stewards need to take ownership as well as responsibility for the metrics used to measure compliance with what is required by Data Governance policies and standards. No one else can do this because it is the Business Data Stewards who understand what the data means, how it should be used, and how the quality needs to be protected and improved.
- Participate in conflict resolution. Resolve issues when able or manage issues through the escalation process. Different business areas use data differently, and these differences can lead to conflicts about definition, appropriate data usage, and required data quality. As these conflicts arise, the Business Data Steward responsible for the data elements in question must take a leadership role in getting them resolved to the satisfaction (if possible) of all stakeholders. However, it is not always possible for the stewards to resolve conflicts at their level, in which case the issues must be escalated to the Data Governors with the steward’s recommendations. The Enterprise Data Steward can guide this process, but the Business Data Stewards must take ownership of getting conflicts resolved.
- Assess enterprise impacts related to data changes. As decisions are made about appropriate data usage, required data quality, and what the data means, these decisions have impacts to the enterprise, as shown in Figure 3.2.
FIGURE 3.2 Decisions about data have impacts across the entire information chain.
- Organize and participate in Data Stewardship committees. It is often necessary for a small subset of Business Data Stewards to cooperate in identifying the correct usage of data as well as resolving issues around data use. Business Data Stewards need to work with their peers in these Data Stewardship committees. Doing so is more efficient than involving the entire council, as many Business Data Stewards have no stake in a particular data usage or issue.
- Work on behalf of Data Governors to ensure consistency of data usage and share best practices. Business Data Stewards carry the authority of Data Governors to ensure that everyone in their business function is aware of best practices around data. This work includes ensuring that everyone knows new uses of the data must be reviewed by the Business Data Steward.
- Work with the business stakeholders to define the appropriate capture, usage, derivation, and data quality business rules for all governed data elements within their data areas. Having these rules defined—and having everyone aware of them—is key to preventing misuse of data, such as when an analyst simply decides to use data in a way it was never intended (thus violating the usage rules) or for which the quality of the data is not sufficient. Business Data Stewards must work with the data users to identify data that meets the business needs or can be adjusted to meet those needs.
- Assist in the identification, definition, and population of the correct and required metadata.
Successful Data Stewardship requires managing the metadata. To manage the metadata effectively, a few key tools and processes are necessary. Primary among those tools are a metadata repository with a business glossary as either an integrated part of the repository or a separate tool. These tools enable the Stewards to store and retrieve definitions, business rules, and other critical pieces of information about the data the steward is accountable for.
Data Quality and Risks
The data quality and risks responsibilities include:
- Work with the business to define acceptable levels of data quality. “Acceptable” levels of data quality are based on “context”—that is, what the data will actually be used for. Thus, the Business Data Steward needs to understand the uses that the data will be put to, and also be aware whenever additional uses are planned for the data. This awareness reflects back to the previous statement about best practices including notifying the Business Data Steward whenever data usage changes.
- Gather and report data quality metrics, and define improvement opportunities. Closely aligned with defining acceptable levels of data quality is gathering and reporting the metrics on data quality—that is, how closely the data quality matches what was defined as acceptable. Any time the data quality appears to be falling below acceptable levels, there is a business case for improvement (a cost–benefit analysis), though whether the business case is strong enough to execute on will depend on how difficult and expensive it will be to bring the data quality back up to acceptable levels.
Domain Data Stewards
The responsibilities of the Domain Data Stewards include alignment, data life-cycle management, and data quality. Many of the responsibilities are the same as for Business Data Stewards, so only items that are different are explained further.
The alignment responsibilities include:
- Aligned to a specific domain of shared data. The Domain Data Steward stewards data where ownership must be shared by multiple business functions. Individual Domain Data Stewards are assigned to a domain, such as customer or product. All data that is deemed to fall within this shared domain is the responsibility of the assigned Domain Data Steward.
- Represent the interests of the enterprise-wide use of the data. The shared domains have specifically assigned business functions from which input is needed (and coordinated) by the Domain Data Steward assigned to that domain. But just like other data elements, the needs of all the stakeholders enterprise-wide must be considered when making decisions. Considering the needs of all stakeholders is especially important with shared data.
- Make recommendations regarding data that is used across multiple business functions. The Domain Data Steward is responsible for recommendations that take into account the needs of the involved business functions. However, if no agreement among the stakeholders can be reached, the Domain Data Steward has the authority (unlike Business Data Stewards) to make the decisions needed. These decisions include the meaning and use of the data, as well as creation, usage, and data quality business rules.
In practice, the decisions of the Domain Data Steward can be overridden by Data Governors (members of the Data Governance Board). It is best for the Domain Data Steward to review decisions with the Data Governors representing the affected business areas before making a controversial decision (a decision that the Business Data Stewards are not in agreement with).
- Build a consensus around domain data usage among the data users across the enterprise. The ideal outcome for establishing definitions and business rules is for the affected Business Data Stewards to be in agreement (or at least in consensus) with the decisions made, and be able to continue to have their data needs met. This requires the Domain Data Steward to build a consensus for data meaning and usage with the stakeholders.
- Work with Business Data Stewards, Project Data Stewards, and Data Governors to ensure business and projects have a practical understanding of the data.
- Participate in processes and standards definitions. The processes and standards for Domain Data Stewards are more complex due to the need to coordinate with Business Data Stewards and (possibly) Data Governors. But just like the processes for the Business Data Stewards, the Domain Data Stewards must have input into the processes and standards they will be responsible for using.
Data Life-Cycle Management
The data life-cycle management responsibilities include:
- Coordinate business requirements and requests specific to the data domain.
- Participate in conflict resolution. Resolve an issue when able or manage the issue through the escalation process.
- Assess enterprise impacts related to data changes.
- Participate in relevant Data Stewardship committees.
- Ensure consistency of data usage across a data domain and share best practices. Achieving consistency of data usage requires achieving (where possible) a consensus of stakeholders on data usage rules, reviewing all usage of the data, and creating and remediating issues where data usage is not consistent with the meaning and quality of the data.
- Work with the business to define the appropriate capture, usage, and business rules for all data elements and data element derivations across their data domain.
- Assist in the identification, definition, and population of the correct and required metadata.
- Member of the Data Stewardship Council.
The data quality responsibilities include:
- Work with the business to define acceptable levels of data quality.
- Gather and report data quality metrics, and define improvement opportunities.
- Defining a valid list of values (reference data) for data elements used across the enterprise. One of the key difficulties of data that is shared across multiple business areas is reaching a consensus on what valid values should be allowed at the logical level. These discussions often highlight subtle differences in data meaning that may lead to breaking a data element into additional data elements. The Domain Data Steward has the responsibility of coordinating these discussions to ensure that all data needs are met.
Project Data Stewards
The responsibilities of Project Data Stewards include metadata, data quality, and Data Governance project alignment.
The metadata responsibilities include:
- Maintain the names and descriptions of the data elements being used in a project. As data elements are exposed and discussed during a project, the Project Data Steward has the responsibility to ensure that a robust name and description is supplied for each element. That is, the name follows data element naming rules, and the description is as complete as possible, stating the best estimate of what the data element means, how it is used, and how it is collected. Having a robust name and description is necessary to ensure that the appropriate Business Data Steward has a clear understanding of what data the project is proposing to use.
- Work with Business Data Stewards to determine the appropriate Business Data Steward to take responsibility for the project data element.
- Review the name and description with the appropriate Business Data Steward and get a business definition. The Business Data Steward must provide a data definition that meets Data Governance standards. The Project Data Steward must be familiar with the standards and evaluate the proposed definitions for conformance to the standards.
- Collect and document business derivations and calculations. In the case that the project data is the result of a derivation or a calculation, the project subject-matter experts should propose the derivation or calculation, or request that the appropriate Business Data Steward do so.
- Review the derivations and calculations proposed with the appropriate Business Data Steward. The Business Data Steward can agree with the proposed derivation, or state what the derivation should be. The Project Data Steward must work with the Business Data Steward to ensure that the derivation conforms to the standards.
- Deliver Business Data Steward decisions to the project for incorporation in the project plan. Decisions made by the Business Data Steward, including definitions, derivations, data usage rules, and other metadata, must be communicated back to the project by the Project Data Steward.
As Project Data Stewards gain experience, there is a danger that they may begin to make recommendations about the data themselves, rather than consulting with the appropriate Business Data Stewards. This must not be allowed. Project Data Stewardsdo not own the data or the responsibility for the data, and thus cannot make decisions about the data. The Enterprise Data Steward must be watchful for signs that Project Data Stewards are not consulting with Business Data Stewards.
In reality, multiple projects may be trying to use the same data. In that case, multiple Project Data Stewards may be working on the same data. Project Data Stewards need a coordination mechanism (e.g., a weekly meeting) to review the data elements of interest to their projects and ensure that they aren’t doing the same work. In addition, Project Data Stewards must ensure that the projects are using the data consistently.
The data quality responsibilities include:
- Collect and document data quality rules and data quality issues from the project. Project discussions will often expose known issues with data quality as it relates to the intended usage. Alternatively, questions may arise about whether the quality of the data will support the intended usage. In such cases, the Project Data Steward should collect and document the data quality requirements that define what is expected of the data by the project.
- Evaluate the impact of the data quality issues on the project data usage and consult with the Data Governors or Data Stewards where appropriate. The consultation should discuss whether the perceived issues are real, and assess how difficult the data issues would be to fix. Data Stewards may suggest other sources of higher-quality data for project usage. The assessment will also feed into the decision on whether to expend the project work effort needed to profile the data in depth.
A major data quality issue, coupled with a lack of an alternative data source, may have a significant impact on the project. In extreme cases, the project may actually fail due to poor data quality that was not discovered before the project started. In such cases, the project scope may need to be expanded to fix the data quality issue. The data quality fix may expand the scope of the project to the point where it is no longer economical or feasible. Clearly it is better to know that before the project has gotten underway. To put it succinctly, there is no sense beginning a project until some level of data inspection is conducted and the quality of the data is assessed.
- Consult with the Business Data Steward and project manager to determine if data should be profiled based on data quality rules and expectations collected on a project. Although the rules may not be well known (or known at all) people who have requirements almost always have assumptions and expectations about what the data will look and act like. One of the nastiest surprises that can occur on a project is that data assumed to be fit for purpose is actually not of sufficient quality to serve the purposes of the project. The consultation with the Business Data Stewards and Technical Data Stewards serves to quantify the risk that the data quality is insufficient for the project’s purposes. The Project Data Steward can then work with the project manager to schedule a data profiling effort, adjusting the project schedule to allow for the extra time needed. As mentioned earlier, it is better to identify the need and profile the data prior to laying out the project schedule, as it can take a significant period of time to profile the data.
Business Data Stewards should either already know the condition of the data or they should insist that the data be profiled. Making the Stewards accountable for the data also makes them accountable for demanding data profiling be performed where they feel it is warranted. Of course, if the project manager refuses to spend the time and expense, this must be recorded in the project risks to document the fact that the steward was ignored.
It is possible to profile data without a tool, but tools are better. They can show you things that you might not be looking for, propose rules you might not know about, and store the results in a reusable form that can save time later. But the act of profiling is essentially one of comparison: what you have versus what you expect.
Data profiling is the process of examining the contents of a database or other data source and comparing the contents against the data quality rules (rules that define what is considered “good quality” in the data) or discovering those rules. Ideally, any project that makes use of data should profile that data. This is especially true of projects that make use of data of questionable quality. Adding data profiling to a project that has not included it in the project plan can lead to significant delays. But not knowing what shape your data is in will lead to much bigger problems! Data profiling is a many-step process that requires collaboration between IT, business data analysts, data profiling tool experts, and the DGPO. These steps can include:
1. Determining what data to profile. Once the project has determined what business data it will need, that business data has to be mapped to its physical sources.
2. Preparing the profiling environment. Data profiling is rarely, if ever, performed on the production server using the operational data. Instead, the data must be migrated to a profiling environment. This involves setting up the environment (server, disk, and database engine), creating the data structures, and migrating the data itself. This is largely an IT task, and it is not uncommon for it to be a challenge to accomplish.
3. Running the profiling tool and storing the results. The data profiling tool expert performs this task, and if all goes well, this usually takes only a day or two, depending, of course, on the complexity and amount of the data.
4. Analysis of the results. This is the most manpower-intensive task, as the results of the profiling have to be reviewed with the appropriate Business Data Stewards. Business Data Stewards must determine whether any potential issues discovered by the tool are actually issues and work with the Technical Data Stewards and the project manager to formulate what it would take to remediate the issues. This typically requires close examination of the profiling results and a clear understanding of both the data quality rules and the characteristics of the actual data as captured by the profiling tool.
5. Document the results. The profiling results and analysis are documented in the tool, and may also need to be published in the project documentation.
As you can see, data profiling is somewhat involved and can take time. However, the time saved due to project delays usually outweighs the cost of the profiling. And as stated earlier, you either know the data is of high quality (probably because you have inspected it before) or it is simply foolish to proceed with the project without inspecting the data.
- Assist the data profiling efforts by performing data profiling tasks. If properly trained, Project Data Stewards can do some of the analysis and guide the work of others to ensure that standards are followed and the results are properly documented in the appropriate tools.
Data Governance Project Alignment
The Data Governance project alignment responsibilities include:
- Inform and consult with other Data Stewards, Data Governors, and the Enterprise Data Steward about definitions and data quality rules and issues that result from a project.
- Work with project managers and project members throughout the course of a project in a collaborative manner while ensuring the Data Governance–related concerns are addressed for each project.
- Where possible, align with projects that utilize a Project Data Steward’s previous experience and expertise. Leveraging a Project Data Steward’s previous experience can lead to a shortened learning curve, making the Project Data Steward more of an asset to the project.
Technical Data Stewards
Technical Data Stewards have the following responsibilities:
- Provide the technical expertise around source systems, ETL (Extract, Transform, and Load) processes, data stores, data warehouses, and business intelligence tools.
- Explain how a system or process works (or doesn’t). Technical Data Stewards are frequently the individual(s) assigned to support a system, and know the “guts” of the system. In addition, they have a historic perspective on “how things got the way they got.” For example, an odd distribution of birthdates was the result of a conversion from an earlier system and the choice made for defaults during that conversion.
- Check code, copylibs, internal database structures, and other programming constructs in search of how the information is structured, how the data moves, and how the data is transformed within a system or between systems.
Technical Data Stewards are most often IT resources assigned to this duty by IT management, with sponsorship from the CIO or IT Data Governance Sponsor (designated by the CIO).
Operational Data Stewards
Operational Data Stewards have the following responsibilities:
- Ensure adherence to data creation and update policies and procedures while creating new values or modifying existing ones. Operational Data Stewards are often the frontline folks putting in new data, new valid values, and updating existing information—or supervising people who are doing this work. This situation presents an opportunity to ensure that the creation and update policies and procedures are followed.
- Assist Business Data Stewards in identification and collection of data metrics. Very often the metrics for measuring conformance to policies, procedures, and standards involve gathering information about the data itself. For example, the metrics might include mandatory capture of key data elements, or usage of only valid values for certain fields. It can be a fair amount of work to gather this information as inputs to the metrics, and the Operational Data Stewards can assist with this work by running the queries and reporting back the results.
- Assist remediation project team with changes to the data, application processes, and procedures. Project teams often need hands-on help with changes to data, application processes, and procedures. Typically, it is best to have these changes made by someone familiar with the processes and data. The Operational Data Steward can step in and help with this, again taking some of the load that would otherwise end up on an Subject Matter Expert (SME) or even the Business Data Steward.
- Assist Business Data Stewards in performing data analysis to research issues and change requests. Researching issues and change requests can involve knowing where the data is, what the data is being used for, and digging into the data itself to see what is going on. Operational Data Stewards can take on much of this work, usually under the direction of the Business Data Steward.
- Identify and communicate opportunities for data quality improvement. Operational Data Stewards are often closer to the data than Business Data Stewards, using it on a day-to-day basis. They are thus in the unique position to see where the quality of the data is insufficient for the need. The Operational Data Steward can open an issue or (depending on how Data Governance is set up) report the issue to the Business Data Steward for resolution.
There are large number of responsibilities that are split up among the various types of Data Stewards, as well as the Enterprise Data Steward and the Data Stewardship Council.
The Enterprise Data Steward leads the Stewardship effort, and thus has some leadership responsibilities. Business Data Stewards bear primary responsibility for the data owned by their business function, are supported with some of the hands-on work by Operational Data Stewards, and depend on Technical Data Stewards for technical information. In addition, Data Stewards have responsibilities as a team (the Data Stewardship Council).