Pipeline zig-zagging through a valley

Speed up your CI pipeline: Throw out your tests

In my last article, I talked about how important it is to implement testing automation in a pragmatic and cost-effective way. My team has found integration-level testing to be so superior to unit testing that we now have almost no unit tests in our system. "Prefer Integration/API Tests over Unit Tests" was our mantra. But if you want to speed up your continuous integration (CI) pipeline, you also need to throw out some of the integration- and API-level tests that are slowing you down. 

These types of tests are very good at detecting regressions. They are resilient to implementation detail changes and help you to focus on testing the most important parts of the system.

But focusing on integration-level testing has one problem: It increases the time it takes for tests to run. And if there is one thing developers hate, it's waiting for test feedback before being cleared for check-in.

How to speed up test execution times

There are many ways to optimize test execution times, such as running tests in parallel (easier said than done), optimizing test code to make it more efficient, and rooting out inefficiencies in data preparation. But there's an interesting phenomenon I uncovered on our team, a bad practice in disguise that introduced tremendous inflation to our automation execution times. 

For the sake of discussion, consider the standard system we like to use in examples: Something that manages items, orders and customers. Consider a retail system that displays a list of items for sale and a list of customers. 

The lists are presented in a grid view. Each grid has a filter control, a column picker control, and a full-screen button.

For example, the items grid will look something like this: 

And the customer's grid will look something like this:

 

Based on the description so far, an automated test suite for the items grid will contain the following tests:

  • click full-screen mode to validate that full screen is active
  • click filter button and filter by title
  • click filter button and filter by description
  • click filter button and filter by price
  • click filter button and filter by category
  • click filter button and filter by condition
  • click column picker and remove the "id" column from view

A test suite for the customers grid might look something like this:

  • click full-screen mode and validate that full screen is active
  • click filter button and filter by name
  • click filter button and filter by country
  • click filter button and filter by birth date
  • click filter button and filter by VIP status

Notice that much of the functionality between the grids is identical (e.g., both support full-screen), or very close (e.g., both grids support filtering, but by different fields).

By now, the smell of reuse should be in the air. 

You could write a test for full-screen mode generically, with the grid that it tests being passed as a parameter, since the actual testing code and the assertions within the code are identical.

This seems like a great scenario that's similar to a data-driven test—a test that, once written, can validate full-screen functionality for all of the grids in the system, not just for items and customers. A compelling opportunity!

The same thing can be done for a filtering test. How cool would it be to write a test where the field that is being filtered gets passed into the test as a parameter, as well as the grid on which it operates? In this way you can apply the test to any other combination of grid + field. 

When you add a new grid to the system, or a new field to an existing grid, it's trivial to extend the test suite to cover the extras. You get extra coverage for free, right? 

A nasty surprise

As it turns out, this way of thinking introduced a huge amount of redundancy into our CI pipeline!  At the time of writing, I do not have an exact number yet because we are still rooting out stuff, but we saw a tenfold improvement in many of the suites. In absolute numbers, consider a suite of tests that used to run 50 minutes that now runs in under 5.

Where did the fat come from?

It's not just testers who can see the potential for reuse. Developers or architects looking at such a set of requirements might think to themselves, "Hmm, this grid looks the same everywhere it appears. Perhaps I should write some generic functionality to support the grids and just customize it for each grid that I need in my program." In other words, similar functionality is often backed by a generic infrastructure of some kind.

In the filtering scenario, the infrastructure may look something like this:

  • The client generates a query object and sends it to the server.
  • On the server side, assuming data is stored in a standard relational database, the query is compiled to SQL by way of metadata-driven mappings configured on the server.

I've seen this so many times on different systems that I feel certain it's a widespread pattern. 

So you have this interesting situation where, from a functional perspective, filtering items is very different from filtering customers. It is done by different personas, it deals with different sets of data, and so on. But from an implementation perspective, the only difference between the scenarios lies in the server-side metadata mappings.

This means that once you prove that the metadata-driven infrastructure works, there is no point in validating it over and over again with other entities, since doing so won't introduce significant variety to the tests.

Consequently, the filtering test for items and the filtering test for customers is one and the same test! It costs you time and money to run it, and it doesn't give you any added value. Which is worse- it gives you a false sense of security: You think you have extended coverage when in fact you haven't. 

Similar reasoning can be applied to the full-screen and column-picker buttons. 

But what if the system has 50 entities that share such functionality, rather than just 3? Implementing combinatorial coverage that will cover each one of the entities individually is as easy as forgetting the costs it incurs. 

A lean, cost-effective test suite must take relevant variety into account. When faced with a combinatorial test, the author of the test must ask, "What are the points that will generate different execution paths inside the flow?" All other points should be discarded in the name of sanity!

Consider the previous example. If you filter by the ID field, there’s no difference in behavior between the items grid and the customer grid. But the programmer might have customized the way that the code filters by the Description field. For example, the code that performs the filtering by description might (for reasons that don’t matter here) generate and execute a secondary search in a different database. If that’s the case, a test of filtering by the Description field must take that customization into account. That means that a generic test won’t be sufficient, and the developer will need to write a specific test for the customized table.

Before you leap: A word of caution

To be sure, the practice described here is not new. The technique of searching for relevant variety in test scenarios while taking into account implementation details is known as gray-box equivalence partitioning and involves identifying sets of data that end up going down the same code path.

Refactoring the code and reducing the number of unnecessary tests is mandatory for implementing cost-efficient automation pipelines, especially those that focus on integration-level tests. And it has the added value of making you think about what you are testing. 

But one word of caution: A major challenge to implementing this approach is that it requires excellent communication channels between testers and developers. On an agile team, where developers and testers are part of the same team, this approach will be much easier. But in places where development and test sit far apart (literally and/or figuratively), going this route may make matters worse until such time as you can effect a culture change. 

Image credit: Flickr

Topics: Quality