How to put AI to work on your performance engineering right now

If you've attended a webinar on artificial intelligence (AI) and machine learning (ML) lately, you've likely heard that they are sweeping the globe, and perhaps you've heard that we'll be able to simply point software at a website, click "go," and get performance test results, all thanks to the magic of AI.

As with most software promises, a little healthy skepticism goes a long way. Yet the questions remain: Are there applications of AI/ML with implications on performance engineering? Are any of them being used in the wild with success? And if so, what are they and how can they be reproduced in our neck of the woods?

Here are six ways AI and ML are changing performance engineering, presented as questions that AI and ML can help answer.

Where is our performance in production headed?

The amount of data in production logs can be overwhelming. Even in the aggregate it can be confusing; an API result simply is not the same thing as an image file or other piece of static content. That means the measure of the average time to serve requests could be misleading. A list of delays by service might mean something to an IT manager, but not much. Median time is probably more informative. And the argument rages on.

But is a look at today's production performance as valuable as knowing where we are going? The simplest application of ML is to use data sources to predict trends. Trends help leaders tell if a situation is under control, getting better, or getting worse—and how fast it's improving or worsening. An ML chart could predict performance problems, allowing the team to take corrective action. And as Benjamin Franklin said, an ounce of prevention is worth a pound of cure.

And it may be possible to test the fix before deploying to production. Being able to verify a fix is great, but being able to do so before impacting a single live user is even better.

What are our users doing?

Historically, a great deal of performance testing has been guesswork. Testers guess at what the users will do, sometimes informed by a log. If the log is organized in a particularly helpful way, the tester might be able to write a Ruby script to figure out what actions are happening the most, or how often, or how long they take. It can take a great deal of time to build enough of these tools to get a full picture of what is happening, or, more likely, what was happening yesterday. And as new features and new APIs with new URLs are added, these tools become obsolete.

Distributed search engine tools flip when using unsupervised ML. You throw all your logs into a shared network drive, the ML tool indexes them, and then you can search using something close to English. The natural-language processing (NLP) tool then tries to figure out what you mean and to give you the answer.

I've used these sorts of tools to find the specific errors happening to customers in production, but also to find out how many times a particular activity is happening over a given time period. Performance engineers can ask today's questions using something similar to a Google search engine, perhaps with a few special keystrokes.

The next step beyond a search engine is an ML tool, perhaps aided by a few keywords or groupings, to look through logs. That may be on the horizon.

How do we make realistic data?

Classic performance tools have the tester perform some operation through the user interface, for example, log in, do a product search, click on the product, add to cart, log out. The tool will record the packets, then play them back.

But the next time the tool runs, a lot of niggling details in the packets might be different. UserID, SessionID, security codes, ProductID, CacheIDs, and any unique codes or timestamp-based codes will need to change in order to have a true simulation. Gil Kedem, product manager of the LoadRunner family of tools at Micro Focus, explained that the newest generation of load test tools can infer meaning in those changing fields and change the values that are sent back to match the ones that are sent in, just as a browser might.

What does our data mean?

Experienced performance testers do more than look at averages and means; they look at percentages. The worst 10% of performance, for example, could be caused by a bug in the system that creates a condition that always performs slowly. Or it could be a user in Alaska trying to download a huge attachment over dial-up.

One way to figure that out is to create a histogram, a map of the distribution or speed of the performance for the feature, function, or system. Another is to figure out if the slow performance happens during load, say, or at the transition point, when the system is trying to add capacity with extra cloud servers that are not online yet.

Kedem pointed out that while this work is the bread and butter of performance analysis, it can often be painstakingly difficult to do. Unstructured ML could connect the dots between time, utilization, frequency, and performance to help answer those questions. More immediately, Kedem foresees products that can help filter out those rare and slow cases, so we can decide if they are noise or something worth paying attention to.

Did we really fix it?

Unstructured ML can figure out what the conditions were in production that caused a failure. With that in hand, capacity and performance engineers can create test scenarios that reproduce the scenario. Run the new code under the scenario to see it pass, yes, but, if you like you can also run the old version of the code under the scenario to see it fail. This sort of two-factor test demonstrates that a change actually resolves the issues it was designed to address.

I was skeptical of this, but Andreas Grabner, a DevOps activist at Dynatrace, tells me he is seeing these sorts of features built into performance test and analysis tools. An active member of the performance community, Grabner is probably best known for the "performance test roadshow" he did pre-COVID in a dozen cities, running performance test analysis before an audience.

With nothing up his sleeves, he promised actionable capacity and system insights in a couple of hours, assuming he got an invitation and the proper permissions.

Is our system resilient when subsystems fail?

Many of us are familiar with Chaos Monkey, the Netflix tool that intentionally pulls down production instances, given that every system should be redundant. Indeed, the redundancy should be redundant. Thus, if a system is pulled down, there should be no impact to system performance.

By monitoring performance when tests are running, management can determine when redundancy is lacking. The resulting outage might last five seconds, but it gives the engineers at Netflix the information they need to fix the problem, so that when that subsystem really does go down, it will have the redundancy to allow traffic to route around.

Picking systems and randomly pulling them down is arguably a type of AI. Yet when most of us say "AI," we mean more than an "if" statement; we mean the capacity to learn. By noticing what is failing and what is working, Chaos Monkey can come up with more complex strategies for breaking things, a sort of true AI that has the ability to improve the customer experience today.

What's coming next

I have tried to focus on places where ML and AI are actually being used. That leads to two questions: What did I miss? And what's next? Talk back to me on Twitter.

Keep learning

Take a deep dive into the state of quality with TechBeacon's Guide. Plus: Download the free World Quality Report 2022-23.
Put performance engineering into practice with these top 10 performance engineering techniques that work.
Find to tools you need with TechBeacon's Buyer's Guide for Selecting Software Test Automation Tools.
Discover best practices for reducing software defects with TechBeacon's Guide.
Take your testing career to the next level. TechBeacon's Careers Topic Center provides expert advice to prepare you for your next move.

Read more articles about: App Dev & Testing, Testing

You are here