You are here

3 non-negotiable tests for microservices and distributed app environments

public://pictures/Bernard Golden Photo 3.png
Bernard Golden, CEO, Navica

You can follow my 5 rules to properly build a microservices-based application, but you still need to verify that your microservices-based application operates properly. There are three modes of testing I've seen in many microservices applications that can be used to successfully verify that the services work as intended despite the increased complexity of the architecture: base testing, scale testing, and resiliency testing. Here's a description of each test and how to use it.

A Developer's Guide to the OWASP Top 10

Base testing: Test the overall functionality

Clearly, the foundation of quality assessment is to validate that the functionality of an application works. But in a microservices application, that validation is more complex than in a traditional, binary “monolithic” application.

First off, each service in a microservices application exposes its functionality through some kind of remote procedure call mechanism: either a formal RPC implemented by libraries that take responsibility for connection and communication, or a loosely coupled mechanism based on a RESTful interface. This service interface is where the functionality of the service must be tested.

This typically means a set of tests that make calls with arguments and, perhaps, a verb that requests an insert, delete, and so on. It’s important that the API functionality test covers all of the functionality in the service and completely exercises the API with a variety of arguments, both valid and invalid. For example, one test might check to see what happens if someone submits a customer ID that is too long. The same test might also check to see if the microservice operates properly when a correctly formatted customer ID is submitted.

But things get tricky when testing microservices because the functionality extends beyond the individual microservice. It’s important to test the overall application functionality. You can't just test the individual services. This typically requires browser-based tests (for online applications) or mobile-device tests for mobile applications. Of course, some applications support both online and mobile clients, so they require two types of tests that check the same general functionality.

[ Conference: DevDay: The what, how, and where of modernization (Jan 29+) ]

Scale testing: Load testing

As I mentioned in my "5 rules" article, microservices-based applications are far more complex in how functionality flows through the various services. Depending upon the functionality that end users activate, different paths may be triggered across a number of services. In some systems, triggering the same exact path through the application functionality may be very rare. 

Additionally, there are many more variables that can affect the operation of a microservices application. Each call to a service traverses the network, which means that response time latency can vary based on other traffic on the network. This is a major difference between microservices applications and monolithic applications. In monolithic applications, functionality calls execute within the same binary, so all you need to do is add another function call onto the execution stack.

In a microservices application, there may also be supporting services or resources that operate faster or slower, depending upon total application traffic or the state of the resource. As an example, if a caching layer is present in the application topology, calls to data may run slower early in the operation of an application. That's because not much data is cached yet, so the application has to make calls into a relatively slower database. Later in the operation of an application, calls to data may run much faster, because most data can be retrieved in a call to the caching layer rather than requiring a call to the database.

In some respects, it is probably better to think of a microservices application as a dynamic environment with constant change occurring. It’s critical to go beyond simple functionality testing and implement load testing to observe how well the application performs when a high number of calls are made to services, or large amounts of data are transferred on the network between individual services. Again, the network will often be the bottleneck.

Load testing will expose parts of the application that are not designed to scale and can prevent meltdowns associated with high amounts of user traffic in production. Don't be tempted to have a few colleagues run some tests on the application and call that load testing. There are a number of sophisticated load-testing solutions available, and they excel at generating enough virtual traffic to truly test how well an application stands up to heavy load.

Resiliency testing: Simulate destructive behavior

In some respects, microservices applications evolved from monolithic architectures to better address application response to highly erratic user traffic. By partitioning application functionality into separately executing services, individual services can grow or shrink to ensure sufficient processing power is available as needed. However, as I noted earlier, this can cause an application to have a dynamic, constantly changing topology.

Microservices applications operate on an endlessly evolving infrastructure environment, and sometimes portions of those environments may encounter failures. For example, individual servers that are running part of a specific service may crash or become unavailable. Or network segments may stop reliably passing traffic. Larger aggregations of resources (e.g., entire racks or even entire data centers) may stop working.

Microservices applications must be resilient in the face of infrastructure failures. But production operation—especially during heavy heavy load—is the wrong time to evaluate just how resilient your application is.

An appropriate approach to evaluating application resilience is to test whether it can continue operation if the underlying resources fail. Netflix pioneered the practice of suddenly removing portions of an application’s infrastructure, or portions of the application itself, and evaluating how well the application performed. Netflix dubbed this (now open source) tool for sudden, random resource destruction Chaos Monkey. Over time, use of the Chaos Monkey has allowed Netflix to improve the resilience of its video service, and has protected it from customer dissatisfaction due to service failures.

Most IT organizations are only beginning to confirm application resiliency through the use of destructive behavior. Only with the emergence of distributed application architectures is it possible to envision application persistence in the event of a resource failure. In the past, resource failure meant the entire application would go down.

One can expect that this kind of resilience testing will become much more widespread, and even common, as applications become more critical to the overall business operations of companies. Just as Netflix would suffer mightily if its video service became unavailable, so too would most companies as their primary mode of customer interaction and product delivery moves online.

New architectures require new testing approaches

Microservice architectures have emerged as a popular response to the shortcomings of traditional monolithic applications. But these architectures come with their own set of complexities and concerns. In particular, testing microservices-based applications requires new approaches to confirm proper operation and continued availability under heavy load or in the face of resource failure. By adopting the three testing approaches outlined in this piece, IT organizations can be more certain that their microservices applications operate properly and perform well under stress.