You are here

You are here

How to stamp out intermittent testing issues with periodic automation

Paul Grizzaffi Principal Automation Architect , Magenic

In the pop culture of the United States, Sasquatch (a.k.a. Bigfoot) is a legendary and elusive ape-like creature infrequently seen in the Pacific Northwest. In the software realm, we have our own version of Sasquatch: those irritating, sometimes catastrophic, issues that are hard to reproduce.

So, as a test engineer, how do you track down your own elusive Sasquatch? I use an approach I call "periodic automation," and it works quite well.

Traditionally, you run your automated tests on event boundaries such as when you've had a successful deployment. Why? When a deployment succeeds, it was initiated because some code changed in your application. To be effective with your time, you look for problems when you think they may have been introduced. Logically, points of change are where you expect to see issues injected, so you tend to only look for issues then.

Unfortunately, this approach limits your opportunities to reproduce the previously mentioned intermittent issues: hard-to-reproduce issues that don't occur on a predictable schedule or set of events. If, however, you also run your automation periodically, in addition to your event boundaries, you'll have additional opportunities to reproduce these types of issues.

Here's how periodic automation works.

[ Learn best practices for reducing software defects with TechBeacon's Guide. Plus: Get the report "Agile and DevOps Reduces Volume, Cost, and Impact of Production Defects" ]

Highly connected software is prone to intermittent issues

Today's software is highly connected, both to components in your production network and to servers and components outside of it. Analytics, payment, and social media services, for example, are often external to your application's network. Reliance on these services makes your environment harder to test.

Just enumerating all of the possible scenarios is generally an insurmountable challenge. But even if you could enumerate them, you couldn't possibly test them all, even using automation. For most of today's systems, testing all scenarios or paths is simply not possible.

That means you must accept the fact that some flows through your application will not be thought of, let alone tested. Many of those flows will be among the most complex, and those are the flows where intermittent issues tend to hide.

[ Also see: 6 test automation tools developers love ]

The more you look, the more you see

It’s a simple approach, really. If you are looking for the animal that's digging a hole in your yard, the more often you look for that animal, the more likely you are to see it.  The same holds true with intermittent issues.

This is the basic principle of periodic automation: If you run your automation more often, you are more likely to reproduce intermittent issues. You can run your automation periodically in many ways, ranging from a simple shell script or .bat file that runs on a timer to a timed event in a continuous deployment tool. It doesn't have to be fancy; you just need it to run more often.

[ Understand what your team needs to know to take advantage of test automation with TechBeacon's Guide. Plus: Get the Buyer's Guide For Software Test Automation Tools ]

Born from academia

While "the more you look, the more you see" is logical, the core of the approach is based in academia. The 2013 Workshop on Teaching Software Testing defined high-volume automated testing (HiVAT) as "a family of testing techniques that enable the tester to create, run and evaluate the results of arbitrarily many tests."

One of the intentions of HiVAT is to help find issues that are timing-related; those issues tend to be intermittent and therefore difficult to reproduce. One way of helping reproduce this kind of issue is to use what HiVAT calls "long-sequence regression testing" (LSRT).

This approach is simple:

  • Select automated regression test scripts that you know will pass.
  • Run those tests in a long sequence (e.g., run each script 1,000 times).
  • Investigate the failures.

By doing this, you are repurposing your functional scripts to be an endurance test of sorts; a specific script may succeed a few times then fail on its 10th execution due to exhausted resources or reproducing an issue caused by timing. Note that periodic automation, as I've defined it above, is like LSRT, but it does not typically execute often enough to cause resource exhaustion issues.

One consideration here is that you may need to adjust your script's setup and cleanup methods. Often, such as in cases where you start the application at the beginning of a script and exit the application at the end of that script, you may not be able to catch issues brought about by running for an extended period because the application would not, in fact, be running for a long period.

Beware failure fatigue

Everything has a downside. While periodic automation may sound like a quick, simple automation approach to reproduce intermittent failures, it also has its own challenges. Automation runs more often, which means you'll have more results—and more failures—to investigate.

It is easy to become desensitized to those failures. You may begin to say, "Oh, the script failed for that intermittent issue again; I know the developers are working on it, so I don’t have to look at those results." 

Unfortunately, the script may have failed in a different place, which might give the developers the information they need to locate the problem. Even worse, the script may have failed due to a different issue that you've now missed. This desensitization is called "failure fatigue."

Here are three ways to minimize failure fatigue:

  • Don’t run your tests more often than you are prepared to triage the results.
  • Minimize the number of steps in the script you're using to reproduce the intermittent issue; the fewer the steps, the faster it runs, but doing so also minimizes the number of failure points not related to the intermittent issue.
  • Make the log and error messages as expressive and meaningful as possible so that the reason for the failure is easy to diagnose.

[ Also see: Epic DevSecOps fails: 6 ways to fail the right way ]

Is it right for you?

Periodic automation is not for everyone. For example, if your team is already busy addressing known issues that are directly reproducible, you might not have time to review additional automation results.

Or if your testing or automation execution environments are fully utilized by traditional automation, you might have to decide between continuing with your existing automation execution schedule and reducing your traditional automation while adding in periodic automation.

If, however, you need help finding or reproducing intermittent problems, if you already have or are willing to create automated scripts to reproduce the problem, and if you can manage the potential for failure fatigue, periodic automation can be a valuable addition to your testing approach.

For more on how to find intermittent issues with periodic automation, come to my presentation on "Hunting Sasquatch" at AgileDev + DevOps, which runs June 7-9 in Las Vegas, Nevada.

[ Practice quality-driven development with best practices from QA practitioners in TechBeacon's Guide. Plus: Download the World Quality Report 2019-20 ]