Doing continuous delivery? Focus first on reducing release cycle times

It is all too easy to define a metric that drives the wrong behavior. That's true not just when it comes to DevOps and continuous delivery, but in all walks of life.

Dan Pink, an expert on the research into human motivation, provides a memorable example of this when he tells the story of a nursery school that hoped to eliminate late child pickups by enforcing a fine. But the nursery school ended up with more late parents because it had removed the social stigma of arriving late by replacing it with a monetary transaction.

In the software world, metrics in particular are one form of motivation that is very difficult to get right. I often hear organizations talk about being "data-driven" and "the importance of KPIs" in driving good behaviors, but do these things really work?

How many times have you seen something that looks crazy and asked, "Why on earth do we do that?" only to hear the answer, "Because that group of people is measured on xyz."

"Why do our salespeople sell stuff that isn't ready?" "Because they are incentivized to sell."

"Why do our developers create poor-quality code?" "Because they are incentivized to create lots of features."

"Why do our operations people slow things down?" "Because they are incentivized to keep the system stable."

2016 State of DevOps Report

However, there is one metric that, as far as I have seen, does not lead to any inappropriate gaming or misdirected incentives—reducing cycle time.

DevOps and continuous delivery is really about one thing: reducing release cycle time. By cycle time, I mean the time involved in having an idea, getting that idea into the hands of our users, and gathering feedback. We should optimize our software development processes for that. Whatever it takes.

Defining cycle time more clearly

Imagine the simplest change to your production system that you can think of. You want it to be so simple that you can ignore the variable cost of development.

Now imagine that change going through all of the normal processes to get it prioritized, scheduled, defined, implemented, tested, verified, documented, and deployed into production—every step that a change to production would normally take. The time that it takes to complete all of those steps, plus the time that the change spends waiting between steps, is your cycle time. This is a great proxy term I use for measuring the time from "idea" to "valuable software in the hands of users."

I believe that if you take an empirical, iterative approach to reducing cycle time, then pretty much all of the benefits of agile development, lean thinking, DevOps, and continuous delivery practice will fall out as a natural consequence.

My experience with a 57-minute cycle time

I once worked on a complex, demanding, high-performance system that processes billions of dollars of other people's money on a daily basis. It included account management, public APIs, Web UIs, administration tools, multiple third-party integrations in a variety of different technologies, data warehouses, and a whole lot of other enterprise trappings. We had a cycle time of 57 minutes. In 57 minutes we could evaluate any change to our production system and, if all the tests passed, be in a position to release that change into the hands of users.

Now think about the consequences of being able to do that.

  • If you have a cycle time of 57 minutes, you can't afford the communications overhead of large teams. You need compact, cross-functional, efficient teams.
     
  • You can't afford the hand-offs that are implicit in siloed teams. If you divide your development effort up into technical specialisms, you will be too slow. You need cross-functional collaborative teams to ensure a continual flow of changes.
     
  • You can't rely on manual regression testing. You need a great story on automated testing. Human beings are too slow, too inefficient, too error prone, and too expensive.
     
  • You can't rely on manual configuration and management of your test and production environments. You need to automate the configuration management, automate deployment, and develop a good story on "infrastructure as code."
     
  • You can't have a cycle time of 57 minutes and have hand-offs between Dev and Ops.
     
  • You can't have a cycle time of 57 minutes if your business can't maintain a constant smooth flow of ideas.
     
  • You have to be very good at a lot of aspects of software development to achieve this kind of cycle time.

If you can confidently evaluate your changes to the point where you are happy to release changes into production in under an hour, without any further work, you are doing very well!

Optimizing for short cycle time drives good behaviors

You don't have to be great at every step in the release cycle before you start seeing the benefits of this mindset. Simply striving to improve your cycle time will help you (and, in some sense, force you) to improve your development process, culture, and technology. It will force you to address impediments and inefficiencies that get in your way. And the best part is that I've never seen this metric encourage bad behaviors.

Many people are nervous that reducing cycle time will reduce quality. In my experience, and in countless experiences I've heard about throughout the industry, I've found that the reverse is true.

What happens is that by reducing cycle time you reduce batch-size. By reducing batch-size, you reduce the risk of each change (ever heard of colleagues warning against "big bang" releases?). Each change becomes simpler and lower-risk. 66% of organizations that claim to practice continuous delivery say that quality goes up, not down (according to the 2015 CA Technologies DevOps Survey). Personally, I'm not sure what the other 34% are doing wrong ;-)

Smaller releases are low-risk and lessen the cognitive load

If you have a short cycle time, you can, and will, release changes in small batches. Think about each change. Each change will be small, simple, and easy to understand.

If you release only once every few months, then you will be storing up lots of changes. If you imagine that each change has a small amount of risk associated with it, then the total risk for any release is going to be the sum of all of those risks.

Except it's worse than that. In addition, there is going to be a compounding effect. What if my change interacts with your change? There is an additional risk associated with the interaction between changes. These risks will grow exponentially as more changes are combined. The more changes that are released together, the higher the risk that two or more changes will interact in unexpected ways.

So the total risk is going to be something like the sum of all the risks associated with each change plus the risk that two or more changes will interact badly. If you release one change at a time, though, as I've previously suggested, you eliminate this secondary compounding of risk.

How to reduce cycle time: My experience in a complex software system

A few years ago, I worked with a team building some complex software in C++. This development team was very good. They had adopted an automated testing approach some years before. They were well ahead of industry norms, because they operated a process based on nightly builds.

Each night their automated systems would build and run their automated tests to evaluate their software. The build and tests took about nine hours to complete. Each morning the team would look at the results and there would be a significant number of test failures. I spoke to one of the developers who had been working this way for the previous three years. He told me that during those three years of doing nightly builds, there had been only four occasions when all of the tests had passed.

To get some work past this common blocker, they decided to release the individual modules that passed all of their tests. This was a reasonable strategy as long as none of the components interacted with any other. Mostly they didn't, but the few components that did interact caused numerous problems because the various combinations of modules that happened to get released interacted in unpredictable ways. Most of the issues arose from incompatibilities with old components.

I argued that cycle time was important, a driver for good behavior and outcomes. We worked hard on the build. We invested a lot of time, money, and effort on experimenting with different approaches. Features we added to the process included:

  • Parallelized builds
  • Improved incrementalism
  • Better servers
  • Tests triaged into groups
  • Builds divided into a deployment pipeline
  • A 12-minute commit stage (running the vast majority of the tests) to replace the 9-hour nightly build 
  • A slower (one-hour) acceptance test stage

The "acceptance test" designation was fairly arbitrary in this case. If a test was too slow, we moved it to the "acceptance test stage."

The results were quite dramatic. In the first two-week period following the introduction of this new build, we saw three builds in which all of the tests passed—compared to four in the previous three years. In the next two-week period, there were multiple successful builds every day (all tests passing).

Instead of cherry-picking modules with passing tests, we could now release all of the software together, or not at all. Each morning we could simply deploy the newest release candidate that had passed all the tests. We could have more confidence that these components would work together, and we could begin improving our test scenarios that crossed the boundaries between components.

Reducing cycle time to improve everything

Reducing cycle time drives good behaviors. It encourages us to establish concrete, efficient feedback loops that allow us to learn and adapt. The team in my war story above was not different before or after the change in process. The change in approach and the focus on cycle time gave them insight into what was going wrong and an opportunity to quickly and efficiently experiment with solutions to any problems that arose.

Cycle time drives us in the direction of lower-risk release strategies. It will move your team in the direction of higher-quality development practices. I encourage you to optimize your development process to reduce cycle time. If you do, I believe that you will see it improve almost everything that you do.

2016 State of DevOps Report
Topics: AgileDevOps