Intro to fuzz testing: How to prevent your next epic QA fail

Software doesn't always fail gracefully, and downtime is expensive. It's best to be proactive and prevent failures from happening. But how do you test for that?

Do you need to look through the code and find all of the places where a crash could happen? Fortunately, no. There's a way to do this quickly and automatically: fuzz testing. Here's what your team needs to know.

Gartner Magic Quadrant for Software Test Automation

What fuzz testing is and how it helps

There's more to testing than just functional tests. In many types of testing, the output isn't as important as how well code behaves when things go wrong. This is where fuzz testing can help. Instead of constructing tests with a known input and an expected output, a fuzz test takes a random unknown input, and the output is “don't crash.”

There are two basic ways to perform a fuzz test, depending on what kind of input data is being used: indirect random and directed random. This addresses the common pattern that most programs have, of sanitizing data before acting on it. Programs are written to make sure that data has been properly checked over before allowing it access into the deeper recesses of the code, where problems would otherwise happen.

A good fuzz test will combine elements from both types of random data, explained below.

Indirect random

Data sanitation is just as much about stopping bad data as it is about filtering and allowing good data through to the rest of the system. Fuzz testing with indirect random data uses completely random data—any length, any content. It could be a megabyte of Unicode characters or a single character. There's no pattern to it.

The goal of using this type of data is to quickly find whether there are unexpected conditions anywhere in the data sanitation part of the code. Places that could cause a crash at the sanitation stage will likely be where some assumptions were made, such as:

Data length

If method A calls method B, and method B has a constraint of 50 characters but method A is capable of passing more than that, then logic needs to be added to method A to filter or truncate that data.

Tokenized strings

For example, extra commas in a string. This can lead to the wrong number of tokens after a string split, which could cause a crash later.

Unexpected data types

If it's assumed that data is always alphanumeric or always an integer, providing unexpected types can trigger a failure. Just because data is supposed to be a certain format doesn't mean it always will be, since specifications can change.

Directed random

Once properly sanitized, data enters the rest of the application and the whole system. Although crashes in general are bad, this part of the application is often the more dangerous part. Turning up weak spots in this part of code is usually a higher priority.

This type of fuzzing works by taking samples of good data and changing it a small amount before passing it through the system. Since it's mostly good, this allows it to get deeper into the code before finding problems. The types of problems that can be found are similar to ones found with indirect random fuzzing, but the difference is where they're found.

Fuzz testing in action: Real-world examples

I used to work in the financial sector, on high-frequency ticker plants for the stock market. The tickers had around 150 programs on the front end that would consume the raw market data—of which each market had wildly different specifications—and normalize that data for the rest of the system to use. Since it was fast-paced and downtime could cost customers thousands of dollars per second, graceful failures were a necessity. At worst, the program should write to the log when something bad happened, but it shouldn't dump a core file and halt the system.

So I came up with a way to feed different kinds of random data into the system to get it to fail—less than gracefully. Directed random data used samples of actual data—stock quotes, trades, and others—with a random change to one byte. And because this was done on a Linux system, using Linux's own random data generator (/dev/urandom) worked well for indirect random testing.

When a failure was detected, the core dump file was set aside, along with the dataset used, for engineers to use to troubleshoot later. This could be done dozens of times per second, and it found some very interesting bugs that would have taken days, weeks, or months to write specific test cases for.

'But that would never happen!'

Tell that to Apple, which suffered crashes on certain models if iPhones whenever someone tweeted a certain Telugu character on Twitter.

“But that would never happen” are the famous last words before many major software crashes. Although the situations that fuzz testing turn up can seem unrealistic, they definitely find out where assumptions are made in code. If an application crashes, something farther up the call stack didn't sanitize data properly, if at all, and an adjustment needs to be made.

Fuzz testing can seem unsporting, but a healthy team dynamic will use it to quickly find where some extra bulletproofing needs to be done in code.

How to get started

If you'd like to incorporate fuzz testing as part of your QA cycle, start finding ways to isolate applications—or parts of them—from the rest of the system. Develop a way to quickly create and then force data through the part that's under test, and determine whether a crash happened or not.

Keep track of any logs, crashes, core files, and data used that caused the crash, so it can be handed over to someone to examine. Also, consider fuzzing applications in parallel. With CI servers such as Jenkins, there's nothing stopping you from attacking an application with multiple threads.

Happy fuzzing!

Share your experiences with fuzz testing in the comments section.

Gartner Magic Quadrant for Software Test Automation
Topics: Dev & Test