5 cures for your test data headaches

Tests are a crucial part of ensuring that our systems actually do what they’re meant to do. All tests inevitably rely on data in some form. The problem is that testers tend to get themselves into unnecessary tangles as a result of how they create and maintain their test data.

A recent IBM study revealed that testers spend up to 60% of their time worrying about their test data. And it's not just test engineers who fret about this: In my experience, developers also spend way too much time on test data.

Whether you're a tester or a developer, here are five things you can do to keep things simple and prevent pain.

Gartner Magic Quadrant for Software Test Automation 2017

1. Prefer builders

I’ve come across many codebases where individual automated tests were laden with test data. The tests either pull big chunks of test data out of files or dedicate many lines of code to defining the test data within the tests themselves. This leads to a high maintenance burden in two ways:

  • It makes the individual tests much harder to work with because it’s too difficult to figure out which parts of the data are relevant to the test at hand.
  • It makes the test suite as a whole harder to maintain because a single change to the data structure in your production code usually means updating dozens of tests.

The simple alternative is to use a builder pattern, which calls a function to provide the test data you need. If, for example, you had a test relating to a user’s passport number, instead of having:

const user = {
firstName: 'Ada',
lastName: 'Lovelace',
email: 'ada@example.com',
passportNumber: 12345

You could simply have:

const user = buildUser({ passportNumber: 12345 });

The builder can then take care of using sensible defaults for the rest of the object. The result: You'll have less noise in your tests and all of the common user attributes will reside in a single place in your test suite. In addition, you can abstract away what it means for a user to exist in your system. For example, your builder could insert the user into the correct database table.

2. Clean your data before your tests run

If you like to keep your code neat and tidy, you’ll be tempted to get each of your tests to clean up after itself by deleting all the test data it created. However, this approach creates unnecessary headaches.

Instead, clean out the old data before your tests run. In this way, each test will have a clean slate from which to start. That means you’ll spend less time chasing failures caused by another test that didn’t clean up after itself. As an added bonus, you'll find it easier to diagnose test failures because, just by peeking into the test database, you will be able to view the state that the test data was in when the test failed.

3. Make sure your test data reflects the real world

Use production data to guide the test cases you need. It’s easy to spend hours on scenarios that are unlikely to happen in the real world. Query your production database to find out what your data really looks like. For example, “what percentage of real production users have invalid email addresses?” or “how many customers request postal statements?”

Make sure the data in your tests is production-like. For example, make a user ID look like a real user ID, and make sure the age field’s value isn’t 412. You can prevent many surprises and late defects by being consistent and accurate. Just make sure that you sanitize any data you pull from production so you don't unintentionally expose people’s private information. I like to have a little fun with my test data, so I use names like “Bruce Wayne” or “Jane Austen” to make it clear to everyone (and myself) that I'm working with test data.

In short, your ideal test data should be realistic, but obviously fake.

Don’t use email addresses you don’t own. I once did work at a company where a customer received an email containing his personal production data. It had been sent from a test email address, and everyone in the company had been copied on the email!

Keep it simple: When you stick an email address in your test code or test database, make sure it’s for a domain that you control. If you worked at TechBeacon, you would use bruce.wayne@techbeacon.com. At my company, we have a whitelist in our email service that makes sure that no emails go out to unsanctioned domains in our test environments. Since hearing the story I just shared with you, however, I’m always a little wary.

4. Don’t treat your test data like pets

DevOps team members often say you should “Treat your servers like cattle, not pets”. The same goes for test data.

I’ve worked in more than a few teams where test data accounts are like members of the family. I’ve often overheard a conversation like this:

“Hey Marge. Can I please borrow the Tess Tester user to test my account creation scenario?"

“I’m busy using her to test the reset-password bug, I’ve just locked her out of her account.”

“Please! We need to ship this feature.”

“No! Last time you managed to break her XYZ record and I had to spend all afternoon fixing it.”

Conversations like these are a clear sign that creating and maintaining test data is too difficult. This anti-pattern usually occurs because there are architectural limitations in downstream systems, so it's an understandable pattern for an obstacle that's not easy to overcome.

The answer lies in automation. The creation of test data, even for manual testing, should be automated.

Write scripts that will create test data in various states. Sometimes the data will be looked after by other teams who take care of a downstream system. In these cases, visit those teams and help them write an API that you can call to create (and delete/modify) test data. That might be impossible—or it might be easier than you think. If it’s impossible, then document the manual steps involved in creating test data. That way, people will steal and break your test accounts less often.

5. Write your test data automation scripts early

For many applications, you'll have at least a few test cases that require specific test data sets. It’s often easiest to write these scripts at the time of development, or when you’re doing initial testing, because you understand all the detail and intricacies associated with the scenario at hand.

It’s much harder to get back into a complex scenario later. Be sure to write a script to generate the test data you need, when you think of it. And if automation isn’t an option, document the manual steps. You’ll thank yourself later.

More organization and automation, less pain

There’s no magic cure for test data headaches. But with the help of automation and a bit of organization, you can turn writing and looking after test suites into less of a hassle.

Do you have your own tips for handling test data easily and reliably? If so, please share them with me in the comments below. 

Gartner Magic Quadrant for Software Test Automation 2017
Topics: Quality