You are here

You are here

Is GUI test automation a recipe for continuous delivery failure?

Matthew Heusser Managing Consultant, Excelon Development
iPhone screen

Many people in the DevOps and agile communities believe that you can achieve continuous delivery (CD) through test automation, often at the GUI level. But GUI test automation alone won't get you there. And if you do it first, it will almost certainly slow you down. Here's why. 

Hurry up and wait

Most teams with which I work feel a great deal of uncertainty about their releases. This feeling arises from expected, but undiscovered, bugs.

Often, when a test process takes too long, the tester may decide to automate the test process. Things work perfectly for the first month or two. The tester kicks off the process, gets a cup of coffee, and gets the results from a small battery of checks within minutes. But then, about six months after the proof of concept, everything slows down.

For the next step, operations needs to get the new build on the server. Soon, the tester needs to clear out the database and run tooling, which now takes hours. After the tooling finishes, errors need to be debugged. Some are defects that need fixing, while others simply reflect changes in the software.

This automation delay gradually takes more and more time, until the team eventually either gives up on tooling or puts up with delays that make the process take as long as it would for a skilled team to test the software by hand.

That means the problem is in the single shared test server, the wait for a build, and the number of reruns. These thing often happen before disaster strikes. But these things are also required to prevent automation delay.

Automation delay avoidance: Prerequisites

Rewrites and other infrastructure improvements can be hard to justify; they may feel like you're running to stay in place. These prerequisites are different. Regardless of when automated GUI checks come into the mix, they will return dividends immediately, with a payback measured in months, not years.

1. Five-minute provisioning

When I worked at Socialtext, we used the openVZ virtualization tool, which works with many flavors of Linux. I could log onto the server, run a one-line command script, and have a new web server available in about three minutes. This meant I could test code changes for just one story, while another tester checked a patch release for problems.

To make provisioning work, we first needed a continuous integration (CI) server that could create builds when requested, or on every push to version control. Once we had the builds in place, we used tools to spin up a virtual server running that build.

2. Test-driven development with unit tests

The feedback loop on TDD is typically faster than for automated GUI checks. Amazingly, TDD usually gives feedback to the developer before the code is even committed to version control. The red-green-refactor approach means fewer bugs will be created, while many of those that do appear will be noted and fixed quickly.

Moving to TDD is a bit like pulling on a loose thread; you start with a problem, and addressing it seems to make things worse. When technical staff try TDD, they often say it’s impossible because the code is not structured to support it. Your team must learn code craft, or how to create simple designs that can be isolated and are testable. They also need to learn to refactor to improve the design of the existing codebase and make it testable.

These small changes add up to improved first-time quality. Teams pursuing this approach can dramatically reduce the number of problems that emerge from the codebase before testing. Pair programming, concrete examples, desk checking, and developer/tester pairing are all ways to improve first-time quality while reducing time-to-problem-discovery.

Once TDD is in full swing, your team members will find that they can create and test in isolated components.

3. Isolated components, services, and integration checks

One of my clients developed a website that only serves up static text files. The files include JavaScript that makes API calls to handle login, search, tag, add to cart, and similar operations. All of those services are isolated, which means the search team can make a change to search, knowing that only search changed. To deploy a change to search, for example, they only need to retest search.

Because these are APIs and don’t have a GUI, it is possible to create contracts for how the software behaves. These contracts, called integration checks, specify by the example given. When you make a change only to search, the integration checks all pass, and the clients all base their assumptions on them, you suddenly need a lot less system-level regression testing before deploying.

4. Deploy by component

The next logical step is to deploy by component. Instead of publicizing the entire application, you publicize a single service. If the service keeps its contracts, add more, and overall reliability will continue to improve.

If the components are a single API, you might be able to keep the system up during the deploy. For a few seconds, search might be down, but by the time the customer realizes and clicks “submit,” it should have been noticed and be back up. Teams that want to deploy multiple times during the business day need to deploy by component.

That being said, if deploys still require six levels of approach and a three-day change-control process, you will not have much success. This can be fixed by five-minute deploys.

5. Five-minute deploys

Deploys should be fixed by one person on the team in five minutes or less. That makes ship decisions easier and enables quick rollback, reducing mean time to recovery (MTTR). Configuration flags, sometimes called “toggles,” are another way to enable quick rollback.

The final piece of the puzzle is to reduce your mean time to identify, or MTTI.

6. Continuous monitoring

Your system will be more effective if it can point out defects within minutes of the defect appearing in production, along with the URL of the web page or API call that produced the error.

To do that, first visualize your production errors on a graph. If the number of errors, processing time, or some other measure is high, correlate that to the command that caused it, and roll back the command.



Monitoring isn’t possible with every kind of defect, but it will work for server errors, such as 500 (internal server error) and 404 (file not found) that can be trapped and logged.

Testing will be much simpler once you have these six pieces in place. Your risk with each release will be lower, defects will be rare, and the time it takes to find and fix issues that do arise will be shortened. These changes radically affect the economics of software quality.

Implications and your situation

W. Edwards Deming, a leader in automotive manufacturing, was critical of mass inspection. He advocated focusing instead on trying to catch problems earlier. If you reduce the problems in the beginning by testing as you write, testing at the end will require a fraction of the effort you expend now.

High-functioning teams can solve that have been using intense GUI automation to solve problems can instead solve them by doing less, not more. That's the key to going faster at anything: taking things away.

Teams that are low-performing and behind all the time have a high regression rate. They can't create a test environment quickly, can't deploy quickly, and are unlikely to be well served by trying to reverse engineer GUI automation. Adding new tasks that they are unqualified to perform on top of their current project, which is already behind, will not improve quality. Of course, you could add budget and hire an automator, but that approach has historically produced mixed results. You could also take your money to Vegas. That produces mixed results too!

A small slice of build verification GUI checks and checks that look at essential functions (for e-commerce, that is login, search, path to purchase, PCI) can be a great idea. These are also the very top of the pyramid.


Mike Cohn’s Test Automation Pyramid.

Why teams spend all that time working at the top of the pyramid

Teams often spend time on GUI automation because the user interface is often all testers can see. Testers who can’t see below the GUI often lack education, programming knowledge, and exposure to the code internals. 

It can be tempting to throw those testers at GUI automation. After all, that is the only attack surface they recognize. Yet having non-programmer testers attempt automation (which is programming) on a surface that is both slow and constantly changing is not exactly a recipe for success.

Given that system of forces, testers should learn the way the software is built, and then work to build delivery infrastructure. That means improving the build, the deploy, CI, provisioning, database refreshes, and so on. Back-end programming, scheduling, virtual environments, and even the cloud are easier and more straightforward to learn than are GUI hacks on top of a legacy system that wasn’t designed for it. If you want to be a DevOps shop, testers can pick up some of those tasks and have a higher chance of success.

By the time that's all done, the team can test and deploy a small change to a subsystem without risk to the larger system, and exhaustive GUI test tooling might not be a requirement at all.

The bottom line: Work on the five prerequisites outlined above and you might not need GUI automation. Do this, and you'll be well positioned to get started.

Note: This article was inspired by the humor video and Internet meme Do You Even Lift.

Keep learning

Read more articles about: App Dev & TestingTesting