Product Details Lean Enterprise: How High Performance Organizations Innovate at Scale (2015)
Part III. Exploit
Chapter 8. Adopt Lean Engineering Practices
Cease dependence on mass inspection to achieve quality. Improve the process and build quality into the product in the first place.
W. Edwards Deming
An effective innovation capability relies on being able to frequently test ideas with real users. Crucially, the rate at which we can learn, update our product or prototype based on feedback, and test again, is a powerful competitive advantage. This is the value proposition of the lean engineering practices we describe in this chapter. Andy Hertzfeld, one of the engineers who worked on the original Apple Macintosh, notes that “instead of arguing about new software ideas, we actually tried them out by writing quick prototypes, keeping the ideas that worked best, and discarding the others. We always had something running that represented our best thinking at the time.”138
In many organizations, getting software deployed in an integrated production-like environment is a process that still takes days or even weeks. But organizations that treat software as a competitive advantage rather than a necessary evil invest substantially in reducing this lead time. For a sense of what’s possible at scale, in May of 2011, Amazon achieved a mean time between deployments to production systems of 11.6 seconds, with up to 1,079 such deployments in a single hour, aggregated across the thousands of services that comprise Amazon’s platform. Some of these deployments affected upwards of 10,000 hosts.139 Amazon, of course, is subject to regulations such as Sarbanes-Oxley and PCI-DSS.
A major reason Amazon has invested in this capability is to make it extremely cheap and low-risk for employees to design and run safe-to-fail online experiments of the type we describe in Chapter 9 to gather data from real users. In many cases, running an experiment doesn’t require going through a bureaucratic change request process. This gives Amazon’s cross-functional delivery teams the ability to test out wild ideas—safe in the knowledge that if something goes wrong, the experiment can be turned off with only a tiny percentage of users impacted for a very short time.
Despite the name, continuous delivery is not about deploying to production multiple times a day. The goal of continuous delivery is to make it safe and economic to work in small batches. This in turn leads to shorter lead times, higher quality, and lower costs. It’s for these reasons that the HP FutureSmart team rearchitected their firmware from scratch to minimize the lead time between code check-in and validated, releasable software. Finally, continuous delivery results in boring, safe, push-button deployments rather than long, painful ordeals that must be performed outside of business hours.
This chapter is aimed at readers who wish to understand the principles and practices behind continuous delivery. For those who want just the high-level picture, we present an executive summary of lean engineering practices in the next section. Readers may then skip to the final section of this chapter.
The Fundamentals of Continuous Delivery
Continuous delivery is the ability to get changes—experiments, features, configuration changes, bug fixes—into production or into the hands of users safely and quickly in a sustainable way. Let’s examine each of those requirements.
In order to ensure deployments are safe, we construct a deployment pipeline which subjects each proposed change to a battery of automated tests of several different types, followed by manual validations such as exploratory testing and usability testing. We then enable push-button deployments of validated builds to downstream test and staging environments, and ultimately to production, release to manufacturing, or an app store (depending on the type of software). A major goal of the deployment pipeline is to detect and reject changes that are risky, contain regressions, or take us outside the envelope of acceptable performance. As a byproduct of implementing a deployment pipeline, we get an audit trail of where each change has been introduced, what tests have been run against it, which environments it has passed through, who deployed it, and so forth. This information is invaluable as evidence for compliance.
We must constantly monitor and reduce the lead time for getting changes into the hands of users. Mary and Tom Poppendieck ask, “How long would it take your organization to deploy a change that involves just one single line of code?”140 We reduce lead time by working to simplify andautomate the build, deploy, test, and release process. We must be able to spin up test environments on demand, deploy software packages to them, and run comprehensive automated tests of several varieties rapidly in parallel on a grid of compute resources. Using this process, it is possible to get a high level of confidence that our software is releasable. Typically this involves architecting (or rearchitecting) with testability and deployability in mind. An important side effect of this work is that the product team can get rapid feedback on the quality of their work, and problems are found soon after they are introduced rather than in later integration and testing phases when they are more expensive to fix.
The point of all this is to make it economically viable to work in small batches. The reason large batches of work are released infrequently is because corralling releases is painful and expensive. The mantra of continuous delivery is: “If it hurts, do it more often, and bring the pain forward.” If integration, testing, and deployment are painful, we should aim to perform them every time anybody checks anything into version control. This reveals the waste and inefficiency in our delivery process so we can address it through continuous improvement. However, to make it economic to work in small batches, we need to invest in extensive test and deployment automation and an architecture that supports it.
There are two golden rules of continuous delivery that must be followed by everybody:
1. The team is not allowed to say they are “done” with any piece of work until their code is in trunk on version control and releasable (for hosted services the bar is even higher—“done” means deployed to production). In The Lean Startup, Eric Ries argues that for new features that aren’t simple user requests, the team must also have run experiments on real users to determine if the feature achieves the desired outcome.
2. The team must prioritize keeping the system in a deployable state over doing new work. This means that if at any point we are not confident we can take whatever is on trunk in version control and deliver it to users through an automated, push-button process, we need to stop working and fix that problem.141
We should emphasize that following these steps consistently will be hard and require discipline—even for small, experienced teams.
Enforcing Your Definition of “Done”
The HP FutureSmart managers had a simple rule to help enforce these golden rules. Whenever anybody wanted to demonstrate a new feature (which was required to be able to declare it “done”), they would ask if the code had been integrated into trunk, and if the new functionality was going to be demonstrated from a production-like environment by running automated tests. The demonstration could only proceed if the answer was “yes” to both questions.
In Chapter 6 we discussed the enormous increases in quality, productivity, and reductions in cost the HP FutureSmart team was able to achieve. These improvements were made possible by the team putting continuous delivery principles at the heart of their rebuild. The FutureSmart team eliminated the integration and testing phases from their software development process by building integration and testing into their daily work. It was also possible to shift priorities rapidly in response to the changing needs of product marketing and users:142
We know our quality within 24 hours of any fix going into the system…and we can test broadly even for small last-minute fixes to ensure a bug fix doesn’t cause unexpected failures. Or we can afford to bring in new features well after we declare “functionality complete”—or in extreme cases, even after we declare a release candidate.
Let’s look at the engineering patterns that enabled the HP FutureSmart team to achieve their eightfold productivity increase.
Continuous Integration and Test Automation
In many development teams, it is common for developers to work on long-lived branches in version control. On small, experienced co-located teams this can be made to work. However, the inevitable outcome of scaling this process is “integration hell” where teams spend days or weeks integrating and stabilizing these branches to get the code released. The solution is for all developers to work off trunk and to integrate their work into trunk at least once per day. In order to be able to do this, developers need to learn how to break down large pieces of work into small, incremental steps that keep trunk working and releasable.
We validate that trunk is working by building the application or service every time a change to it is made in version control. We also run unit tests against the latest version of the code, and give the team feedback within a few minutes if the build or test process fails. The team must then either fix the problem or—if the problem cannot be fixed in a few minutes—revert the change. Thus we ensure that our software is always in a working state during the development process.
Continuous integration is the practice of working in small batches and using automated tests to detect and reject changes that introduce a regression. It is, in our opinion, the most important technical practice in the agile canon, and it forms the foundation of continuous delivery, for which we require in addition that each change keeps the code on trunk releasable. However, that can be hard to adopt for teams that are not used to it.
In our experience, people tend to fall into two camps: those who can’t understand how it is possible (particularly at scale) and those who can’t believe people could work in any other way. We assure you that it is possible, both at small scale and large scale, whatever your domain.
Let’s first address the scale problem with two examples. First, the HP FutureSmart case study demonstrates continuous integration being effective with a distributed team of 400 people working on an embedded system. Second, we’ll note that almost all of Google’s 10,000+ developers distributed over 40 offices work off a single code tree. Everyone working off this tree develops and releases from trunk, and all builds are created from source. 20 to 60 code changes are submitted every minute, and 50% of the codebase changes every month.143 Google engineers have built a powerful continuous integration system that, in 2012, was running over 4,000 builds and 10 million test suites (approximately 60 million tests) every day.144
Not only is continuous integration possible on large, distributed teams—it is the only process that is known to scale effectively without the painful and unpredictable integration, stabilization, or “hardening” phases associated with other approaches, such as release trains or feature branches.Continuous delivery is designed to eliminate these activities.
FUNDAMENTALS OF TEST AUTOMATION
As can be seen from the Google and HP FutureSmart examples, continuous integration relies on comprehensive test automation. Test automation is still controversial in some organizations, but it is impossible to achieve short lead times and high-quality releases without it. Test automation is an important and complex topic about which many good books have been written,145 but here are some of the most important points:
§ Test automation is emphatically not about reducing the number of testers—but test automation does change the role and the skills required of testers. Testers should be focused on exploratory testing and working with developers to create and curate suites of automated tests, not on manual regression testing.
§ It is impossible to evolve high-quality automated test suites unless testers collaborate with developers in person (irrespective of team or reporting structures). Creating maintainable suites of automated tests requires strong knowledge of software development. It also requires that the software be designed with test automation in mind, which is impossible when developers aren’t involved in testing.
§ Test automation can become a maintenance nightmare if automated test suites are not effectively curated. A small number of tests that run fast and reliably detect bugs is better than a large number of tests that are flaky or constantly broken and which developers do not care about.
§ Test automation must be designed with parallelization in mind. Running tests in parallel enables developers to get fast feedback and prevents bad practices such as dependencies between tests.
§ Automated tests complement other types of testing such as exploratory testing, usability testing, and security testing. The point of automated testing is to validate core functionality and detect regressions so we don’t waste time trying to manually test (or deploy) versions of the software that contain serious problems.
§ Reliable automated tests require comprehensive configuration and infrastructure management. It should be possible to create a production-like virtual test environment on demand, either within the continuous integration environment or on a developer workstation.
§ Only spend time and effort on test automation for products or features once they have been validated. Test automation for experiments is wasteful.
The main objection to continuous integration comes from developers and their managers. Breaking every new feature or rearchitecturing effort into small steps is harder than completing it in isolation on a branch, and takes longer if you are not used to the discipline of working in small batches. That means it may take longer, at first, to declare stories “dev complete.” This may, in turn, drive the development velocity down and create the impression that the team’s efficiency has decreased—raising the blood pressure of development managers.
However, we should not be optimizing for the rate at which we declare things “done” in isolation on a branch. We should optimize for the overall lead time—the time it takes us to deliver valuable software to users. Optimizing for “dev complete” time is precisely what causes “integration hell.” A painful and unpredictable “last mile” of integration and testing, in turn, perpetuates the long release cycles that are a major factor in project overruns, poor quality software, higher overall costs, and dissatisfied users.
ARE YOU REALLY DOING CONTINUOUS INTEGRATION?
Continuous integration (CI) is hard, and in our experience most teams that say they are practicing it actually aren’t. Achieving CI is not simply a case of installing and running a CI tool; it is a mindset. One of our favorite papers on CI discusses how to do it without any CI tool at all—using just an old workstation, a rubber chicken, and a bell (of course you’ll need more than that on a large development team, but the principles are the same at scale).146
To find out if you’re really doing CI, ask your team the following questions:
§ Are all the developers on the team checking into trunk (not just merging from trunk into their branches or working copies) at least once a day? In other words, are they doing trunk-based development and working in small batches?
§ Does every change to trunk kick off a build process, including running a set of automated tests to detect regressions?
§ When the build and test process fails, does the team fix the build within a few minutes, either by fixing the breakage or by reverting the change that caused the build to break?
If the answer to any of these questions is “no,” you aren’t practicing continuous integration. In particular, reverting bad changes is an insufficiently practiced technique. At Google, for example, anyone is empowered to revert a bad change in version control, even if it was made by someone on a different team: they prioritize keeping the system working over doing new work.
Of course if you are in-flight working on a large application and using lots of branches, it’s not easy to move to continuous integration. In this situation, the goal should be to push teams towards working on trunk, starting with the most volatile branches. In one large organization, it took a year to go from 100 long-lived branches down to about 10–15.
The Deployment Pipeline
Recall the second golden rule of continuous delivery: we must prioritize keeping the system working over doing new work. Continuous integration is an important step towards this goal—but, typically, we wouldn’t feel comfortable exposing to users software that has only passed unit tests.
The job of the deployment pipeline is to evaluate every change made to the system, to detect and reject changes which carry high risks or negatively impact quality, and to provide the team with timely feedback on their changes so they can triage problems quickly and cheaply. It takes every check-in to version control, creates packages from that version that are deployable to any environment, and performs a series of tests against that version to detect known defects and to verify that the important functionality works. If the package passes these tests, we should feel confident deploying that particular build of the software. If any stage of the deployment pipeline fails, that version of the software cannot progress any further, and the engineers must immediately triage to find the source of the problem and fix it.
Even the simplest deployment pipeline, such as that shown in Figure 8-1 (a more complex deployment pipeline is shown in Figure 8-2), enables members of the team to perform push-button deployments of builds that have passed CI to production-like exploratory testing or user acceptance testing environments. It should be possible to provision test environments and deploy any good CI build to them using a fully automated process. This same process should be used to deploy to production.
Figure 8-1. Changes moving through a simple deployment pipeline
The deployment pipeline connects together all the steps required to go from check-in to deployment to production (or distribution to an app store). It also connects all the people involved in delivering software—developers, testers, release engineers, and operations—which makes it an important communication tool.
Figure 8-2. A more complex deployment pipeline
THE FUTURESMART DEPLOYMENT PIPELINE
The FutureSmart team’s deployment pipeline allows a 400-person distributed team to integrate 100–150 changes—about 75–100 thousand lines of code—into trunk on their 10-million-line codebase every day. Each day, the deployment pipeline produces 10–14 good builds of the firmware out of Level 1. All changes—including feature development and large-scale changes—are made on trunk. Developers commit into trunk several times every week.
All changes to any system—or the environments it runs in—should be made through version control and then promoted via the deployment pipeline. That includes not just source and test code but also database migrations and deployment and provisioning scripts, as well as changes to server, networking, and infrastructure configurations.
The deployment pipeline thus becomes the record of which tests have been run against a given build and what the results were, what builds have been deployed to which environments and when, who approved promotion of a particular build and when, what exactly the configuration of every environment is—indeed the whole lifecycle of code and infrastructure changes as they move through various environments.
This, in turn, means that a deployment pipeline implementation has several other important uses besides rejecting high-risk or problematic changes to the system:
§ You can gather important information on your delivery process, such as statistics of the cycle time of changes (the mean, the standard deviation), and discover the bottlenecks in your process.
§ It provides a wealth of information for auditing and compliance purposes. Auditors love the deployment pipeline because it allows them to track every detail of exactly which commands were run on which boxes, what the results were, who approved them and when, and so forth.
§ It can form the basis of a lightweight but comprehensive change management process. For example, Australia’s heavily regulated National Broadband Network telco used a deployment pipeline to automatically submit change management tickets when changes were made to the production infrastructure, and to automatically update their CMDB when provisioning new systems and performing deployments.147
§ It enables team members to perform push-button deployments of the build of their choice to the environment of their choice. Tools for implementing deployment pipelines typically allow for such approvals to be issued on per-environment basis and for workflows around build promotion to be enforced.
CONTINUOUS DELIVERY AND CHANGE CONTROL
Many enterprises have traditionally used change advisory boards or similar change control systems as a way to reduce the risk of changes to production environments. However, the 2014 State of Devops Report,148 which surveyed over 9,000 individuals across many industries, discovered that approval processes external to development teams do little to improve the stability of services(measured in terms of time to restore service and percentage of failed changes), while acting as a significant drag on throughput (measured in terms of lead time for changes and change frequency). The survey compared external change approval processes with peer-review mechanisms such as pair programming or the use of pull requests. Statistical analysis revealed that when engineering teams held themselves accountable for the quality of their code through peer review, lead times and release frequency improved considerably with negligible impact on system stability. Further data from the report, which supports the use of the techniques discussed in this chapter, is presented in Chapter 14.
The data suggests that it is time to reconsider the value provided by heavyweight change control processes. Peer review of code changes combined with a deployment pipeline provide a powerful, safe, auditable, and high-performance replacement for external approval of changes. The National Broadband Network case study (referenced above) shows one method to implement a lightweight change control process which is compatible with frameworks such as ITIL in a regulated environment. For more on compliance and risk management, see Chapter 12.
Implementing continuous delivery requires thinking carefully about systems architecture and process and doing a certain amount of upfront planning. Any manual activities which are repeated should be considered potential waste and thus candidates for simplification and automation. This includes:
It should be possible to create packages from source, deployable to any environment, in a single step using a script that is stored in version control and can be run by any developer.
Anybody should be able to self-service a test environment (including network configuration, host configuration, any required software and applications) in a fully automated fashion. This process should also use information and scripts that are kept in version control. Changes to environment configuration should always be made through version control, and it should be cheap and painless to kill existing boxes and re-provision from source.
Anybody should be able to deploy application packages to any environment they have access to using a fully automated process which uses scripts kept in version control.
It should be possible for any developer to run the complete automated test suite on their workstation, as well as any selected set of tests. Test suites should be comprehensive and fast, and contain both unit and acceptance-level tests.
We require, as a foundation for automation, excellent configuration management. In particular, everything required to reproduce your production system and to build, test, and deploy your services needs to be in version control. That means not just source code but build, test, and deployment scripts, infrastructure and environment configuration, database schemas and migration scripts, as well as documentation.
Decouple Deployment and Release
The most important principle for doing low-risk releases is this: decouple deployment and release. To understand this principle, we must first define these terms.
Deployment is the installation of a given version of a piece of software to a given environment. The decision to perform a deployment—including to production—should be a purely technical one. Release is the process of making a feature, or a set of features, available to customers. Release should be a purely business decision.
Often, these two terms are treated as synonyms—that is, we use deployment as our primary mechanism for performing releases. This has a very serious negative consequence: it couples the technical decision to deploy to the business decision to release. This is a major reason why organizational politics gets injected into the deployment process, to the detriment of all.
There are a number of techniques for deploying software to a production environment safely without making its functionality available to users—so we can validate that our system behaves correctly. The simplest—and one of the most powerful—is blue-green deployments (sometimes known as black-red deployments). This pattern requires two separate production environments, code-named blue and green. At any time, only one of these is live; in Figure 8-3, it’s green.
Figure 8-3. Blue-green deployments
When we want to release a new version of our service, we deploy the packages with the new features to the environment that is not currently live (blue in this example) and test it at our leisure. The release process then simply changes the router to point to the blue environment; to roll back, we point the router back to the green environment. A more sophisticated variation gradually ramps up traffic to the blue environment over time.
Crucially for companies with painful deployment process who cannot release during peak hours, blue-green deployments allow the deployment process to be done safely during normal business hours, days before a planned release if necessary. The much simpler release process (and rollback, if necessary) can then be performed at off-peak hours remotely by a much smaller group of people.
Some organizations use their main and backup data centers for their blue and green environments, thus verifying that they can perform a hot disaster-recovery process every time they deploy. However, the blue and green environments do not have to be physically segregated. They can be virtual or logical environments running on the same physical infrastructure (especially since the non-live environment typically consumes very little resources).
Deployment and release can also be decoupled at the feature or component level, instead of the system level, using a technique known as “dark launching.” In his talk on the Facebook release process, release manager Chuck Rossi says that all the major features that will launch in the next six months are already in production—you just can’t see them yet. Developers protect new features with “feature flags” so that administrators can dynamically grant access to particular sets of users on a per-feature basis. In this way, features can be made available first to Facebook staff, then to a small set of users as part of an A/B test (see Chapter 9). Validated features can then be slowly ramped up to 100% of the user base—and switched off under high load or if a defect is found. Feature toggles can also be used to make different feature sets available to different groups of users from a single platform.
Dark Launching for Mobile Apps
Instead of launching new mobile apps directly to an app store, create a separate brand to deploy and validate them before launching them under your official brand.
Continuous delivery represents an alternative to large-batch development and release processes. It has been adopted by many large engineering organizations across different domains, including heavily regulated industries such as financial services. Despite its origins in web services, this engineering paradigm has been successfully applied to packaged software, firmware, and mobile development. It enables organizations to respond rapidly to changing customer needs and increase software quality while reducing both the risk of release and the cost of software development.
Culture also plays an important role in enabling continuous delivery. A culture in which interactions between development, operations, and information security teams are generally win-win is highly correlated with high performance, as is a culture that is at the “generative” end of Westrum’s typology (Chapter 1).
As organizations work to implement continuous delivery, they will have to change the way they approach version control, software development, architecture, testing, and infrastructure and database management. Figure 8-4 is synthesized from our study of a number of different organizations.149
Figure 8-4. Deployment g-forces, courtesy of Paul Hammant
Of course all these areas are interrelated. For example, building a maintainable, comprehensive, automated test suite requires an architecture which allows software to be deployed on local developer workstations, which in turn requires that production-like environments can be set up by version-controlled scripts. Working out what to attack first, in the case of existing systems, can be complex. We discuss evolutionary architectural change in Chapter 10.
We strongly recommend that you start by implementing comprehensive configuration management, continuous integration, and trunk-based development. Also important is creating a culture of test automation with developers, which in turn requires that test environments can be provisioned on demand. In our experience, attempts to attack problems in release or operations, discussed in Chapter 14, cannot produce significant improvement without continuous integration, test automation, and automated environment provisioning.
Questions for readers:
§ What is your definition of “done” in order for a feature to be accepted? Must it—at the very least—be integrated into trunk and demonstrated from a production-like environment by running automated tests?
§ Are you practicing continuous integration as we define it in this book? How would you work to introduce it?
§ Are the relationships between developers, testers, and IT operations personnel collaborative or adversarial? What steps might you take to improve them?
§ Are your production deployments painful, “big bang” events that involve planned outages outside of business hours? How could you change them so as to perform more of the work within normal business hours?
139 According to Jon Jenkins’ talk at Velocity 2011, “Velocity Culture (the unmet challenge in Ops).”
140 [poppendieck-06], p. 59.
141 This is the concept of jidoka in the Toyota Production System as applied to software delivery.
142 [gruver], p. 60.
145 We recommend [freeman] and [crispin].
146 James Shore’s Continuous Integration on a Dollar a Day.
147 See http://puppetlabs.com/blog/a-deployment-pipeline-for-infrastructure/
149 This diagram is adapted from one by Paul Hammant, http://paulhammant.com/2013/03/13/facebook-tbd-take-2.