How to lift your test automation game: Tame your data

The most commonly given advice for handling test data is to make tests "atomic," which means that each test is self-contained and doesn't rely on other tests for data or state changes. It's good advice—but it isn't enough.

You need a data strategy. Here's how to test better by taming your data.

Testing in the Agile Era: Top Tools and Processes

What is a data strategy?

A data strategy is a simply method for controlling the data within a system. You accomplish this by planning how data is created prior to running tests, and how or if you will clean it once the tests are complete.

Paul Merrill identified three of the most common patterns—which he refers to as "elementary," "refresh," and "selfish"—in his article "3 highly effective strategies for managing test data." 

These three patterns also reflect the stages through which automation often passes as it is being implemented.

1. Elementary (no creation, no cleanup)

When using the elementary approach, automation uses only existing data within the system. This technique is frequently used when developing a proof of concept or initial implementation. This method rarely works in the long term because data within the system will eventually change—and your tests will fail.

2. Refresh (no creation, cleanup after each run)

This strategy responds to the issue of tests failing because of changed data by calling for the system's data source to be restored to a pre-determined state. This helps to ensure that the required data is present and in the correct state when automation begins running. But it does not prevent failures caused by scripts that change shared data within the system, causing failures.

3. Selfish (tests create required data, no cleanup)

Automation implementing a selfish data strategy is driven by the desire to make tests atomic. Each test generates its own data during initialization and does not rely on other scripts to run successfully.

While this sounds like the perfect solution, two issues immediately come to light: You'll create excessive amounts of data, and you'll spend a lot of time doing test initialization.

There is no silver bullet

No one strategy can solve all of the possible data issues that can occur within an automation environment. Rather than seeking a silver bullet, then, think through the test requirements, and apply an appropriate method to assure maximum stability.

And if you aren't sure whether a particular strategy is the right one for a test, don't worry. Your test suite will let you know by slowing down, returning false positives, creating intermittent failures, or exhibiting any number of other issues.

Common data issues in the automated era

The easiest way to understand the importance of having a data strategy is to look at some of the common issues caused by data in an automated environment. The following list of problems and solutions is by no means exhaustive.

Intermittent failures

Few things are more frustrating to deal with than intermittent failures within automation. When tests rely on the same data pool, race conditions can cause such problems.

For example, two tests reference the same account in a financial application. The first test processes a deposit to the account, and the second suspends the account. These tests run successfully most days but occasionally fail when running in parallel, because the account gets suspended before the deposit is made.

By having these tests configured with separate accounts, this race condition can be avoided. Alternatively, the tests could set the appropriate account state during their setup, but this could still cause issues when running in parallel.

False positives or negatives

Often lumped in with intermittent tests or completely unnoticed, false results can frustrate the team and cause a loss of confidence in automation. Improperly controlled data is one possible cause of these issues.

A common problem with false positives occurs when a test runs correctly, but the data is not properly cleaned up. This will cause the data to be confirmed as correct whether the application functions correctly or not. This issue can be handled by using a selfish approach, in which the test creates new data for every run, or by having the data restored to its original state before or after each run.

Slow performance

While there are many possible reasons for slow performance when running tests, two causes are directly related to the data strategy you use. If the test environment doesn't have a cleanup strategy in place, data can grow exponentially as you run the tests. While that may be good for performance testing, it is a detriment to smoke and regression suites that need to run as fast as possible while validating functionality.

Another other common test performance issue is caused by the setup routine attempting to create all of the necessary data on each run. That can often take longer than testing the functionality.

Both of these issues can be resolved by implementing a base data load that provides the data required, and using that to restore the test environment's data source on a regular basis.

Develop your own data strategy

When developing a data strategy for a test environment, don't generalize a solution but instead take the time to fully understand the system's pain points. No data strategy is a silver bullet, so customize your data strategies to match the needs of your environment.

The patterns covered above are helpful in addressing the needs of the environment and specific tests, so you should consider them when planning creation and cleanup strategies. You can use the following questions while reviewing the tests and data requirements to develop appropriate strategies for your environment.

    Creation strategy

    • Does the test require specific data, or can the data be randomly generated?
    • Is it faster to perform a data load prior to a test run or to allow the tests to create or manipulate the data at run time?
    • Should the data be generated as needed or via an asynchronous process?

    Cleanup strategy

    • Should data cleanup be performed immediately or periodically?
    • Is it feasible to wipe all data in the test environment after each test run?
    • Can tests be configured to quickly restore the environment to a pristine state as they run?
    • Will the system support asynchronous cleanup scripts?

    There are many other environment-specific questions that can be added to the above, but these should serve as a good starting point. They should also serve as a reminder that test results are only as good as the data being used. Without dependable data in the system, any automation will produce questionable results.

    Testing in the Agile Era: Top Tools and Processes
    Topics: Dev & Test