Embrace Lean Thinking for Governance, Risk, and Compliance - Transform - Product Details Lean Enterprise: How High Performance Organizations Innovate at Scale (2015)

Product Details Lean Enterprise: How High Performance Organizations Innovate at Scale (2015)

Part IV. Transform

Chapter 12. Embrace Lean Thinking for Governance, Risk, and Compliance

All things are subject to interpretation. Whichever interpretation prevails at a given time is a function of power and not truth.

Friedrich Nietzsche

Trust is not simply a matter of truthfulness, or even constancy. It is also a matter of amity and goodwill. We trust those who have our best interest at heart, and mistrust those who seem deaf to our concerns.

Gary Hammel

We often hear that Lean Startup principles and the techniques and practices we suggest in this book would never work in large enterprises because of governance. “This won’t meet regulatory requirements.” “That doesn’t fit in our change management process.” “Our team can’t have access to servers or production.” These are just a few examples of the many reasons people have given for dismissing the possibility of changing the way they work.

When we hear these objections, we recognize that people aren’t really talking about governance; they are referring to processes that have been put in place to manage risk and compliance and conflating them with governance. Like any other processes within an organization, those established for managing governance, risk, and compliance (GRC)213 must be targets for continuous improvement to ensure they contribute to overall value.

There are many large enterprise organizations that have been able to apply lean engineering practices and develop a culture of experimentation as we have described earlier. They are subject to the same level of regulatory compliance and review as others. So we know it can be done.

In this chapter, we aim to guide you through the maze that is GRC, particularly as it relates to managing the concepts and practices required to be a lean enterprise. This area is sometimes poorly understood by those who have not made GRC their career focus, so we present some background to help you reach a common understanding with GRC teams. With that, it should be easier to discuss how GRC processes and controls can be improved to allow product teams to continuously explore and improve their work. We provide some examples of how lean concepts and principles can be applied to improve GRC processes, resulting in better governance and reduced overall risk, while still meeting compliance.

Throughout this chapter, we refer to “GRC teams.” For clarity, our discussion and examples focus on teams that strongly influence how technology can be used within organizations; the more common ones are the PMO, technical architecture, information security, risk and compliance, and internal audit teams.

Understanding Governance, Risk, and Compliance

In the introduction to Part I, we stated that the primary responsibility of leaders is to steer the larger organization towards its goals, adjusting course as necessary. This is governance. Unfortunately, within organizations the term governance is often misused and conflated with management theories, models, and processes designed to meet the needs of a bygone era.

Governance is about keeping our organization on course. It is the primary responsibility of the board of directors, but it applies to all people and other entities working for the organization. It requires the following concepts and principles to be applied at all levels:

Responsibility

Each individual is responsible for the activities, tasks, and decisions they make in their day-to-day work and for how those decisions affect the overall ability to deliver value to stakeholders.

Authority or accountability

There is an understanding of who has the power and responsibility to influence behaviors within the organization and of how it works.

Visibility

Everyone at all times can view the outcomes achieved by the organization and its components, based on current and real data. This, in turn, can be mapped to the organization’s strategic goals and objectives.

Empowerment

The authority to act to improve the delivery of value to stakeholders is granted at the right level—to the people who will deal with the results of the decision.

Risk is the exposure we run for the possibility of something unpleasant occurring. We all manage risks daily, at work, home, and play. As it is impossible to eliminate every risk, the question to be answered in managing risk is, “Which risks are you willing to live with?” As you take steps to mitigate risk in one area, you inevitably introduce more risk in another area. A classic example of this is restricting development team access to hardware and forcing them to rely on a separate centralized infrastructure team to set up access and environments for testing or experiments. This may be effective for the server support team’s goal of reducing the risk of instability within systems, but it increases the risk of delayed delivery as teams have to submit requests to other teams and wait for them to be fulfilled.

Compliance is obedience to laws, industry regulations, legally binding contracts, and even cultural norms. The intention of mandated compliance is usually to protect the interest of stakeholders with regard to privacy of information, physical safety, and financial investments. When bound by law, regulation, or contract, compliance is not optional. If we choose not to comply, we increase our risk of fines, operational shutdowns, or damage to our reputation. In extreme cases, jail terms can be the outcome of knowingly and systematically misrepresenting an organization’s compliance.

MANAGEMENT IS NOT GOVERNANCE

COBIT 5214 clearly explains the difference between governance and management.

Governance ensures that stakeholder needs, conditions, and options are evaluated to determine balanced agreed-on enterprise objectives to be achieved; sets direction through prioritization and decision making; and monitors performance and compliance against agreed-on direction and objectives.

Management plans, builds, runs, and monitors activities in alignment with the direction set by the governance body to achieve the enterprise objectives.

For example, governance involves creating the vision and goals for implementing technology changes at a rate that will allow the business to succeed. It defines what should be measured to determine if we are headed in the right direction to achieve our goals. Management determines how the organization will achieve that vision. In the case of technology changes, that includes structuring of the delivery teams, their boundaries, and what level of decision they are empowered to exercise. Will it be a single, one-size-fits-all, top-down driven process, or will teams be granted autonomy and empowered to make decisions without having to wait for high-level approvals? Good GRC management maintains a balance between implementing enough control to prevent bad things from happening and allowing creativity and experimentation to continuously improve the value delivered to stakeholders.

Take an Evolutionary Approach to Risk Management

A struggle we often experience when implementing GRC structures and processes for compliance is thinking of them as something carved in stone, rather than something that should be changed, modified, and improved. To enable good governance, changes to GRC processes must happen over time in response to the changing needs of the organization and the market environment within which it exists.

When done well, GRC management processes improve value delivery through effective risk management. The intent is to improve communication, visibility, and understanding of who is doing what, when, how, and why, as well as the outcomes of the work that is done. This is strongly aligned with what product delivery teams are trying to achieve. The question then becomes: why are GRC processes viewed as blockers when looking for ways to improve our productivity and the value we deliver to customers and our organization?

Unfortunately, many GRC management processes within enterprises are designed and implemented within a command-and-control paradigm. They are highly centralized and are viewed as the purview of specialized GRC teams, who are not held accountable for the outcomes of the processes they mandate. The processes and controls these teams decree are often derived from popular frameworks without regard to the context in which they will be applied and without considering their impact on the entire value stream of the work they affect. They often fail to keep pace with technology changes and capabilities that would allow the desired outcomes to be achieved by more lightweight and responsive means. This forces delivery teams to complete activities adding no overall value, create bottlenecks, and increase the overall risk of failure to deliver in a timely manner.

Apply Lean Principles to GRC Processes

As with everything else we address in this book, the journey to apply lean principles to GRC processes—and the ensuing results—will look different in every organization, depending on the nature of our business and where we operate. There is no cookbook recipe that fits all circumstances (as reputable frameworks like ITIL215 and COBIT explain). However, lean principles and concepts can be applied to any GRC management process: visualizing the value stream, increasing feedback, amplifying learning, empowering teams, reducing waste and delays, limiting work in process, making small incremental changes, and continuously improving to achieve better outcomes.

A natural tension exists between GRC teams—charged with recommending and advising on how to reduce risks and meet compliance for applicable laws and regulations—and the rest of the organization who just want to get work done, the sooner the better. Tension can be good, though. It sparks creativity, but that creativity is only good if all parties involved know and strive to meet common objectives and are ultimately measured by the same standard. When tension is bad, the result is less collaboration, visibility, and compliance as individuals and teams develop secret ways to circumvent GRC processes. This leads to decisions based on inadequate or inaccurate information, which weakens overall governance.

The GRC teams’ goals and objectives usually result in more work for all teams. Some of this is good. Upfront attention to risks, threats, and controls can save a lot of pain during the final steps towards production. Being able to prove we have adequate control measures in place is important during audits and helps keep us in compliance. The challenge is to find the correct balance of control that allows teams to move forward quickly and keeps risks related to compliance down to an acceptable level.

Define the Value of GRC Processes from the Customer Perspective

To get value out of GRC processes such as access control, technical change management, and solution delivery lifecycle, we must always start with a shared understanding of our organization’s goals, values, and the intended outcomes of the process. We need a common view of how our daily work contributes to these at the organizational level, no matter with which team we associate ourselves. This means our GRC teams need to take responsibility for the outcomes (good and bad) of compliance and risk management activities and their impact on the ability of teams to deliver in a timely manner. As well, product delivery teams need to understand the language, intent, and purpose of the processes and controls established for compliance and governance. Only then will these teams, who are usually viewed as working at cross-purposes, be able to “stop fighting stupid and make more awesome.”216

Thus, GRC teams must view themselves as members of the product delivery team, learn about the capabilities of the technology and techniques used in lean engineering, and help teams leverage them to provide evidence of being in compliance without creating waste and bottlenecks. At the same time, the entire delivery team needs to start paying attention to the language and frameworks used by GRC teams to understand what exactly it is that the GRC teams are trying to achieve.

We have seen a lot of waste and destructive tension between GRC and delivery teams because many GRC processes and management practices are disconnected from how teams work. Typically, GRC teams focus on performing and measuring compliance (for example, by asking, “Did everyone follow the activity as described in our framework?”), not on improving the outcomes (“Are we doing what will allow us to meet compliance and continue to deliver value in a timely fashion?”).

Avoid the “Wouldn’t It Be Horrible If” Approach to Risk Management

In How to Measure Anything, Douglas Hubbard reports Peter Tippet of Cybertrust discussing “what he finds to be a predominant mode of thinking about [IT security]. He calls it the ‘wouldn’t it be horrible if…’ approach. In this framework, IT security specialists imagine a particularly catastrophic event occurring. Regardless of its likelihood, it must be avoided at all costs. Tippet observes: ‘since every area has a “wouldn’t it be horrible if…” all things need to be done. There is no sense of prioritization.’”217

When prioritizing work across our portfolio, there must be no free pass for work mitigating “bad things” to jump to the front of the line. Instead, quantify risks by considering their impacts and probabilities using impact mapping (see Chapter 9), and then use Cost of Delay (see Chapter 7) to balance the mitigation work against other priorities. In this way we can manage security and compliance risks using an economic framework instead of fear, uncertainty, and doubt.

GRC teams are measured by “Are we compliant?”; product teams are measured by “How fast can we deliver value through use of technology?” Both of these are wrong because they measure a team’s performance from an isolated functional perspective and not as the net value for the organization. It is easy to be compliant with laws when GRC teams are allowed to mandate processes and force all boxes to be ticked. However, when team performance measures are not aligned at the organizational level, we can be compliant and still make remarkably bad decisions about delivering value to stakeholders. This is truly ironic, as most related laws and regulations have been established with the intent to protect and improve value to stakeholders.

RULES-BASED APPROACHES LEAD TO RISK MANAGEMENT THEATER

When GRC teams do not take a principles-based approach and instead prescribe the rules that teams must blindly follow, the familiar result is risk management theater: an expensive performance that is designed to give the appearance of managing risk but actually increases the chances of unintended negative consequences.

At one large European enterprise we worked at, the change approval process involved developers filling in a spreadsheet with seven tabs, which was emailed to a change manager in another country who then decided whether or not to approve it. The change could not proceed without this approval, and if the form was not filled out completely it got sent back. The change manager did not really understand the contents of the spreadsheet; before approving, he relied on conversations with the developers to determine what were the risks and whether the planned mitigation activities were appropriate. The developers knew this and did the minimum possible amount of work to fill in the spreadsheet, often just changing the date and title on a previous submission and sending it back as a new request. The change manager knew the developers were doing this, but it made no difference to him so long as the documented process was followed to the letter. It added zero value in terms of risk management, while making it unnecessarily painful for the team to get their changes live. However, compliance was being met through the “evidence” documented on the change request. The real value was realized in the conversations and completing mitigation activities before the change proceeded.

When product teams push back on risk management theater, a common response is that it is required by some popular framework such as ITIL or COBIT, or by a law or regulation such as Sarbanes-Oxley. However, with a few exceptions, neither frameworks nor laws prescribe particular processes. For example, many people think that segregation of duties218 is required by Sarbanes-Oxley section 404, so organizations set up elaborate controls over access to IT systems and environments to meet their interpretation of what this means. In fact, nowhere in the act—nor in the SEC rules that were created through the act—is segregation of duties mentioned.

If you find that you are expected to follow a process that compromises your ability to do a good job, it’s worth actually reaching out to the people who created the process to discuss its intent. Return to the Principle of Mission discussed in Chapter 1 and use it as an opportunity to collaborate, build relationships, and develop a shared understanding. You may be surprised to discover that you are able to have a productive conversation about how to meet their goals in a different way, or indeed to see if your work is even in scope for the law or regulation in question. If you are told that a particular process is “required” by some regulation, politely ask where you can find more information about that requirement. In many cases, onerous rules and GRC processes that are put in place are simply somebody’s interpretation of what is required, not mandated by the regulation in question.

Map the Value Stream, Create Flow, and Establish a Pull System

With a shared understanding of GRC processes and product delivery team goals and methods, the collaboration to achieve organization-level goals can really begin. As discussed in Chapter 7, value stream mapping is a powerful tool that can be used to provide us with a view of the current state and identify areas for improvement. In the context of GRC processes, it is important to layer these on top of the delivery team activities and understand how they influence the ability of the team to get their work done.

Most GRC processes are designed in isolation to apply controls such as required approvals, limited access, segregation of duties, monitoring, and review of activity. These are meant to provide visibility and transparency into who does what, when, and with what authority. More importantly, the frameworks commonly used by GRC teams to create the processes emphasize improving overall efficiency and effectiveness for the organization. Unfortunately, many of the processes and controls do the exact opposite when considered in the larger end-to-end value chain.

The Wrong Control Interrupts Flow

Controls can be preventive in nature by the application of a barrier. Alternatively, they can be detective—monitoring and reviewing events after they occur, and eliciting an appropriate response to the discovery of potential exceptions such as errors, omissions, or malicious actions.

Many of us make the mistake of thinking that preventive controls are more effective: if we can create barriers or take away people’s ability to do things, it won’t happen. The reality is, people need to get things done. If you try to stop them, many will get creative and figure out ways to work around whatever barriers have been put in place. The reactive response is then to lock everything down even more, which emboldens further creative underground solutions to get the work done, fomenting a subversive culture of risky behavior. A good example is teams who will share an elevated user ID and password to access different environments. It would be far better to give each team member access under their own IDs and then monitor their use of those privileges.

An even more tragic outcome of too many preventive controls is when teams just stop caring and assume an automaton mode of operation, abandoning all efforts to make things better.

Preventive controls, when executed on the wrong level, often lead to unnecessarily high costs, forcing teams to:

§ Wait for another team to complete menial tasks that can be easily automated and run when needed

§ Obtain approvals from busy people who do not have a good understanding of the risks involved in the decision and thus become bottlenecks

§ Create large volumes of documentation of questionable accuracy which becomes obsolete shortly after it is finished

§ Push large batches of work to teams and special committees for approval and processing and then wait for responses

If preventive controls are not executed properly and consistently, they are no longer effective. They must be continuously monitored to ensure they have been applied correctly and are still relevant. Without monitoring and resulting corrective actions, preventive controls are less effective than well-executed detective controls such as ongoing monitoring, early and frequent testing and review, and highly visible measurement of outcomes.

Although relying on preventive controls may contribute to a false sense of security, they are extremely valuable when applied at the right level, and are the best solution in certain circumstances. However, they should never be applied unilaterally but only in conjunction with other controls and to the correct level of granularity, and we must always consider their effect on the ability of teams to get their work done.

Therefore, when we perform value mapping of governance processes on top of delivery team processes, we need to look carefully at all of the controls and ask two questions:

§ Is the intent of the control being met?

§ Is it truly contributing to overall effectiveness and efficiency of the organization?

We need to look carefully at the level of authority granted to our teams. The goal is to bring the approval decisions to the right level and give teams as much authority as possible to enable them to keep moving. This involves defining boundaries and making sure the team knows how and when to escalate decisions that fall outside their authority. We also need to make sure documentation is kept to a sane level and, when done, make sure it is accessible, easy to understand, and updated as required, preferably automatically.

“Trust, but verify”219 is a concept that is gaining acceptance in GRC circles. Instead of preventing teams from accessing environments and hardware so they can’t do anything bad, we trust people to do the right thing and give the team access and control on the systems and hardware they need to use daily. We then verify the team is not abusing their authority by developing good monitoring and frequent review processes to ensure the established boundaries are observed and there is complete visibility and transparency built into the team’s work.

REDUCING FEEDBACK LOOPS ON COMPLIANCE ACTIVITIES

Meeting compliance for Information Security has been a thorn in the side of many delivery teams. In the spirit of the big bang project delivery methodology, the security team is brought in at the latest possible moment—days before we go live—to run a final code review for security vulnerabilities and required compliance.

The Information Security community now realizes this approach doesn’t work. On most products, there is just too much complexity and volume to complete a meaningful review. When vulnerabilities or other breaches in compliance are discovered this way, it is generally too late to do much about it. It becomes more risky to fix the vulnerabilities in a fragile system, or wait for the changes, than it is to allow the vulnerabilities to go to production with a promise to fix them later.

To meet compliance and reduce security risks, many organizations now include information security specialists as members of cross-functional product teams. Their role is to help the team identify what are the possible security threats and what level of controls will be required to reduce them to an acceptable level. They are consulted from the beginning and are engaged in all aspects of product delivery:

§ Contributing to design for privacy and security

§ Developing automated security tests that can be included in the deployment pipeline

§ Pairing with developers and testers to help them understand how to prevent adding common vulnerabilities to the code base

§ Automating the process of testing security patches to systems

They also create their own environments for performing mandatory code reviews and security testing so they don’t block the team from performing other work while this is done.

As working members of the team, information security specialists help shorten feedback loops related to security, reduce overall security risks in the solution, improve collaboration and the knowledge of information security issues in other team members, and themselves learn more about the context of the code and the delivery practices. Everybody wins.

As we become better at creating flow for teams by changing governance processes, GRC teams benefit as well. Using controls designed in collaboration with GRC teams, product delivery teams are able to embed evidence of true compliance into daily work and tools, and do away with risk management theater. As we do with functional and performance quality, we build evidence of compliance into our daily work so we don’t have to resort to large batch inspections after most of the work has been done.

The net effect for GRC teams is that they can now pull information related to compliance from product delivery teams at any time without interrupting the team’s overall workflow, unless something untoward or unaccountable seems to be happening. Annual audits are less painful because the delivery teams understand the intent of the controls the auditors are asking for and can give evidence of meeting that intent through their processes.

By using an economic framework (such as Cost of Delay, discussed in Chapter 7) we can quantify the economic trade-offs we make when we implement controls to mitigate risk. This allows us to prioritize GRC work against the other kinds of work we do—and thus pull additional work required for compliance at the right time for the business.

CASE STUDY: PCI-DSS IMPLEMENTATION AT ETSY

Etsy is an online handmade and vintage marketplace with over $1bn in gross merchandise sales in 2013. In Etsy’s high-trust culture, developers normally push their own changes live—indeed, as part of onboarding new engineers, developers use the automated deployment system to update their profile on the live site within their first few days. Engineers are also allowed to work on—and have access to—all parts of the system.

However, since Etsy processes credit-card transactions, it is subject to PCI-DSS, an industry standard that is quite prescriptive in how to manage systems that store or transmit payment cardholder data (these systems are known as the cardholder data environment, or CDE). For example, the CDE must be physically segregated, and there must be segregation of duties for people who work on systems within the CDE.

Segregation of duties is usually interpreted to mean (among other things) that developers should not have access to the production database and should not be able to push their own changes live. Both of these requirements conflict with the way Etsy typically operates. Here’s how they approached PCI-DSS compliance.

1. Minimize the fallout of the required compliance. Understand there is no one-size-fits-all compliance solution, and architect systems to separate the concerns related to different compliance demands.

Etsy’s mainstream engineering culture is optimized for speed of innovation. However, credit card processing is an area where user data security is paramount. Etsy recognizes that different parts of their system have different concerns and need to be treated differently.

Etsy’s most important architectural decision was to decouple the CDE environment from the rest of the system, limiting the scope of the PCI-DSS regulations to one segregated area and preventing them from “leaking” through to all their production systems. The systems that form the CDE are separated (and managed differently) from the rest of Etsy’s environments at the physical, network, source code, and logical infrastructure levels.

Furthermore, the CDE is built and operated by a cross-functional team that is solely responsible for the CDE. Again, this limits the scope of the PCI-DSS regulations to just this team.

2. Establish and limit the blast radius of frameworks and regulations.

Always start by asking, “What’s the smallest possible set of changes we can make to our ideal architecture and culture while still achieving compliance with regulations we are subject to?” Then take an incremental, iterative approach to implementing and validating those changes.

For example, while PCI-DSS mandates segregation of duties, that doesn’t prevent the cross-functional CDE team from working together in a single space. When members of the CDE team want to push a change, they create a ticket to be approved by the tech lead; otherwise, the code commit and deployment process is fully automated as with the main Etsy environment. There are no bottlenecks and delays, as the segregation of duties is kept local: a change is approved by a different person than the one doing it.

3. Use compensating controls.

It’s essential to respect the outcomes the regulations are trying to achieve, while recognizing there are many ways to achieve those outcomes. For example, PCI-DSS allows organizations to implement “compensating controls”—a workaround designed to create the same outcome—where there is a legitimate technical or business constraint preventing implementation of a particular control.220

In the case of PCI-DSS, you should talk to your qualified security auditor (QSA) and acquiring bank to discuss possible alternatives to controls that have an unacceptable technical or business impact. For example, the deployment pipeline described in Chapter 8 and used by Etsy provides a powerful set of compensating controls that can provide an alternative to segregation of duties in their other systems.

The advantage of using lean principles and continuous delivery in product development is that it enables a fine-grained, adaptive approach to risk management. As we work in small batches and are able to trace each change to our systems from check-in to deployment, we can quantify the risk of each change and manage it appropriately.

The best way to achieve the objectives of good GRC is by embedding compliance and risk management into the daily activities of product teams, including systems and UX design and testing. As organizations move away from the command and control paradigm and GRC teams adopt a collaborative approach to risk management, we begin to value them as trusted advisors and experts in their knowledge domain. For many GRC teams, this requires a major shift in their roles, responsibilities, and behavior within an enterprise organization. This is the move from a policing role to that of a contributing team member who is measured on the same outcomes as the product team, not solely a compliance perspective.

Conclusion

Good governance requires everyone to focus on discovering ways to improve value and provide accurate information on which to base our decisions. We start with leadership and direction from the Board and Executives, and rely on the ability of employees to embrace their responsibility to make good decisions at work. A culture of openness, trust, and transparency is required for good governance.

GRC structures and processes must be developed collaboratively by both GRC teams and the product teams that work day to day to deliver value to customers. By identifying the intent of the laws and regulations we must comply with, our GRC teams can collaborate with product teams to determine local approaches that fit best with improving value delivery. We start by exploring, with GRC teams, how we can minimize the negative effects of relying on restrictive controls through creative use of system architecture, process improvement, containment of scope, applying compensating controls, and leveraging new technologies. We can then exploit our learning to continuously improve our processes to provide both better governance and better outcomes for all stakeholders.

Questions for readers:

§ How do your product teams view your current GRC processes? To what extent is your organization engaged in risk management theater?

§ What actions do leaders take to develop a shared understanding of GRC language and frameworks throughout the organization?

§ Do your GRC structures (policies, organization, and processes) prevent product teams from performing process improvement or require them to seek approval for any process change? If so, how might you support teams improving their processes while maintaining compliance?

§ How might you enable GRC teams to collaborate with your product delivery teams as trusted team members throughout the value creation process?

213 Typical GRC processes include access control, solution delivery (project management), change management, and related activities to reduce risks with the use of IT.

214 As set out in [COBIT5], COBIT formally stands for Control Objectives for Information and Related Technology. It strives to provide an end-to-end business view of the governance of enterprise IT. Auditors as well as risk and compliance teams use the framework and related tools to create and assess governance over the use of technology in delivering value. For more information, see http://www.isaca.org/cobit/pages.

215 ITIL (Information Technology Infrastructure Library, see http://www.itil-officialsite.com) is a framework, evolving over 20 years, providing recommended sets of practices for managing IT based on experience from both public and private sectors. It is largely used by IT management and practitioners.

216 Jesse Robbins, http://www.infoq/presentations/Hacking-Culture

217 [hubbard], p. 188.

218 Segregation of duties is a concept that seeks to prevent errors and malicious activities by an individual by requiring at least two people to complete any end-to-end transaction. Another way to approach it is to ensure no one person can complete a transaction without it being detected or controlled by at least one other person.

219 This saying, popularized by Ronald Reagan, is originally a Russian proverb.

220 http://bit.ly/1v732EU