A tester's guide: How to become a continuous delivery leader

It's been over a decade since the announcement that "test is dead," and even longer since Jez Humble and David Farley published the book, Continuous Delivery, in 2010. In the time since, continuous delivery (CD) has moved from a fringe Silicon Valley fad to the way software development is done for most startups.

But what about the non-startups? Here’s what you, "just" a testing professional interested in quality, can do right now, to move your organization toward CD.

Evolution or revolution?

The common argument in favor of CD is revolution. Throw out everything the organization has, put in a CD pipeline, and deploy to production multiple times a day. That works for fine for the people it works for. It certainly gets rid of all the redundancy, inefficiency, and waste.

Sometimes, though, that waste evolved for a reason.

In his book Systemantics: How systems work and especially how they fail, John Gall argued that all that mess has a purpose. The human body, for example, has many "wasteful" and "redundant" systems: two ears, two nostrils, two eyes. In the case of eyes, the duplication adds value. The other systems cover for each other in the event of failure. The only system without redundancy in the human body, the heart, is the No. 1 system to fail. All that redundancy came through evolution.

The alternative is revolution: Throw everything out and start over. The space shuttle and the Titanic were both ground-up rewrites designed to improve safety. Digging into the subsequent failures is beyond the scope of this article, but the takeaway is that the big-bang approach has problems.

You want to inch toward CD, rather than throw out your existing processes. To that end, here are some ideas on how to influence things you can't control. Systemantics is a start.

[ Special Coverage: Agile Testing Days USA ]

Focus on mean time to recovery; isolate deploys

The classic way to look at test-deploy is that testing and deploying are expensive. So batch up testing, to get it all done at the same time. If possible, deploy less often, so you have to pay for testing less often.

The problem with that approach is that it allows uncertainty to build in the software, making testing (and fixing, and retesting) more expensive. So you test less often. So …

You get the point. The traditional thinking about test-deploy is that it has incentives that slow down delivery. Because delivery is slower, you want to get it right, with a very high mean time to failure.

That means you need to test better and longer, which increases cost. So you test less often, all in the name of process improvement. (One common outcome when companies implement SAFe is that the deploy cadence slows. This can actually increase cost.)

With CD, you focus more on mean time to identify and mean time to recovery (MTTR)—the length of time it takes to identify problems and fix them when something goes wrong. Imagine, for example, if you could find a fix a problem in two hours. How about in 15 minutes? It might be better for the customer to deal with two or three problems in production than to invest dozens of person-days in testing a deploy.

That 15-minute feedback loop still may not fly for, say, financial transactions. With classic software infrastructures, there may be no way to tell the difference between the financial logic and the text on the "about us" page. If you can separate that logic, and if you can be certain that your changes are only on the "about us" page, you can have a much more aggressive deploy process.

While that sounds extreme, changes to the "about us" page were exactly how engineers at Etsy once learned their CD process, without much risk.

In practice, companies tend to isolate services, such as login, search, or product information, so changes can be made to just that service. If something goes wrong in search, programmers will know exactly what to assess, perhaps checking the last few changes. If there has been only one change in the past day, at 30 minutes ago, and the problem is 30 minutes old, that makes for a pretty easy fix. Microservices are one common strategy to isolate deploys.

Once you understand the philosophy and the goal, it's time to get started.

Step 1: Measure the deploy chain

How long does it take to get a change to production, from the moment someone makes the commit to the moment the change goes live? The most common answer I get from my consulting clients is that it depends on the team.

So I say: "What about your team?"

Then they say it depends on the project, or the sprint, or the change.

So I tell them I'm asking about this project, this sprint, this change.

Then they smile sheepishly.

Diagram your process from commit to when it goes live in production. This might require talking to people outside your scope of control, or even your scope of influence.

Create one unified view of the process, at least for the software you support. Post it on the wall.

Optimize the longest step

The next phase involves an optimization process. Identify the step that takes the longest, that can be reduced the most, with the least effort. Often, teams can cut the test-deploy time in half with simple, common-sense measures like removing the tests that never find bugs, which, even if they did find bugs, would not be show-stoppers.

Often the delay is due to waiting from commit to release-test-time. Isolation and cheap rollback can help reduce the cost of release-test-time, making it happen more frequently.

Sometimes the steps exist for good reason: to mitigate some risk. There is no "common-sense" way to remove them without incurring risk. They might, however, be eliminated if you can replace the risk-management step with a different one, or with a series of smaller steps.

Step 2: Use techniques to mitigate the risk

Here are three ways to reduce MTTR to make shipping more often (with less testing) practical.

Production monitoring

Most of the teams I work with have production monitoring. The monitoring is done by someone else—ops or a sysadmin or production support. The teams rave about the quality of the production monitoring. It is "amazing," they say. Anytime I ask a question about what customers are doing, I am referred to ops, which, they say, would /should be able to answer that using [name_of_tool].

Except they never can.

Two weeks after I've started my search, I'm back in the team, confused. There is always some reason. This or that microservice isn't hooked up to the monitoring tool yet, or production IDs are anonymized. Most likely, the people from production support were simply too busy to track down a half-dozen requests, or the tool doesn't quite answer that question.

That's bollocks.

When I talk about production monitoring, I mean the ability to get answers that matter to the programming team. That means the time a request lives on the server, the number of requests, the number of 400, 500, and 504 errors. This, along with debugging information on which microservices are called how often and how long they take to run. In complex environments I want to see a dependency map of which services call which other services.

To lead the charge to CD, start with a list of questions you can't answer. Those are your prerequisites. Figure that out, split the services up, and you could deploy systems independently all day long.

If you want to go beyond that, start to look for the answers.

There's a reason I put production monitoring first on this list. Staged releases, quick rollback, and feature flags all help fast recovery. But to recover, you need to know there is a problem. Production monitoring is one way to do that. There are other methods, such as synthetic transactions in production. But here I've focused on what's important but often overlooked.

Canary (staged) releases

Once you can recognize a problem early, you can roll out 5%, 10%, 20% of the users onto the new functionality before rolling out the main body. These should be your employees, power users, friends, early adopters, and others signed up for early (beta) testing. The term canary release comes from the canary-in-the-coal-mine metaphor, where the canary would pass out before any miner did when exposed to bad air, giving miners time to escape.

Goranka Bjedov, a performance engineer at Facebook, once told me that Facebook uses New Zealand as a testbed. The country had a few million English-speaking people with browsing habits very similar to those in high-revenue countries, but with different daylight hours than the company's highest utilizers.

Thus, if Facebook shipped some code that contained a bug and caught it during the same "day" in New Zealand, it would be fixed before the newspapers opened for business in the United States, or even Europe. The fundamental issue here is segmenting your codebase in such as way that the new code only runs for certain users.

Feature flags are one way to do that.

Feature flags

Imagine that every feature has a unique key, allowing the programmer to write this:

if (feature_turned_on(“capcha”, userid) { // execute feature } else { .// leave that part of the screen blank }

This feature flag is set in a database, or perhaps in a file on disk. Changing it is as easy as a SQL update or perhaps changing a file in version control and a copy or push. These changes do not have the scale or epic quality of a traditional deploy. No one needs to compile anything, and no special tests need to run. You simply push the change.

Feature flags are one way to allow staged releases; they are also a way to reduce MTTR. By turning the feature on for just testers, they permit testing in production.

It's tempting to come up with one architecture to bind them all for feature flags, to "do it right," to fund it "like a project," to get executive buy-in, and so on.

Forget all that. Just do it.

For a major feature you are working on, work with the programmers on a config flags scheme that is good enough. It might just be one big file that stores a collection of features and on/off conditions. Or you might add user types, and offer canary releases for certain types of users. In that case, you'd turn the feature on for only those users.

You might not even have canary releases. Then at the sprint demo, announce it as a bonus, and describe what you can do with it. Suddenly everyone will want one, and want to know how.

And you will be leading.

Distributed deploys

Earlier I mentioned code running in isolation. You want it isolated so that search, login, tag, profile edit, and update can be deployed separately. Yes, there may be database changes and dependencies and code libraries and all sorts of things that need to be deployed as a group. There may be a bunch of things that can't really be done continuously.

So find one thing, just one, that you can deploy. Again, work with the technical team to figure out how to do this. It might be an API, some sort of RESTFul service, a single-page web application, packaging "your" stuff in "your" own JAR file, or writing static HTML pages that call APIs that already exist.

It doesn't really matter. The point is, as with feature flags, figure out a way to make an independent deploy happen, and announce it as a bonus.

As with feature flags, suddenly everyone will want one.

Quick/easy deploys and rollbacks

Deploys and rollbacks are often the easiest thing to automate as a push-button. The next step is a little project to deploy any change independently. But doing that and not taking the server down for 10 minutes at a time might be a challenge.

For some applications, like mobile apps deployed through a "store," CD might not make sense. So tread lightly. The point is to personally step into the build process and rework it. Once the process is one-step, it will be time to automate deploy. That might require the help of people in operations.

Focus on automating that one isolated deploy first. Just do it for one service. Then slowly add services.

Step 3: Build a coalition

Once you’ve identified the right thing to do, you need a team to do it. That team will likely be cross-disciplinary—which leads to the term DevOps. But doing this is probably harder than saying it. The IT operations groups may be in another building, another campus, another continent. Conversations with them might be discouraged. You might be told to just file a ticket and wait, or that these sorts of improvements are the job of operations, or that a DevOps team is coming in next year's budget.

Nonsense.

Find one thing your team can do right now without anyone else’s permission. Talk to everyone involved with the work privately, including the product owner who might have to "pay" for your time to build it, and figure out how to get it done. From there you can build momentum.

Be prepared for blowback

Once the train is on the tracks, it will overwhelm you. People will start to do all kinds of crazy things you never thought of. The initiative will get beyond you. But you'll have started it, and people that matter will have noticed.

Forget about losing your job because of DevOps and CD. Be prepared to lose it. Because in losing that old job you don't want, you might just find the one you do.

Want to know more? Come to my tutorial on "Enabling Continuous Delivery [With Testing]" at Agile Testing Days USA. The conference runs from June 23 to 29; my session is on the 25th.

Keep learning

Take a deep dive into the state of quality with TechBeacon's Guide. Plus: Download the free World Quality Report 2022-23.
Put performance engineering into practice with these top 10 performance engineering techniques that work.
Find to tools you need with TechBeacon's Buyer's Guide for Selecting Software Test Automation Tools.
Discover best practices for reducing software defects with TechBeacon's Guide.
Take your testing career to the next level. TechBeacon's Careers Topic Center provides expert advice to prepare you for your next move.

Read more articles about: App Dev & Testing, Testing

You are here