Shadows of people in line

Are you using queuing theory to accelerate performance test analysis?

Now that agile and DevOps are in widespread use in software development environments, early and quick performance analysis of business-critical, high-traffic applications is essential. This has led to an increase in the adoption of performance modeling activities, which helps to predict system performance metrics during early lifecycle phases.

Because many performance testers fear mathematics, they mistakenly think queuing theory is too complex in nature and leave it to core performance modeling experts. But knowledge of simple mathematical queuing theory principles can make system performance analysis easier. It’s important for performance testers to understand the relationships that exist between performance metrics such as response time, throughput, and server utilization so that they can quickly identify performance-, scalability-, and capacity-related problems. But first I'll digress with a few basics for the uninitiated on how to use queuing theory principles in performance testing. 

Go to TB LearnMobile APM: An introduction to mobile performance monitoring and testing strategies

What is queuing theory?

Queuing theory is the mathematical study of waiting lines, or queues. A queuing system is represented using three primary elements: 

  • The input
  • The service center
  • The output

Input refers to the job arrivals to the system, service center refers to the processing component that performs the expected service, and output refers to the job completions that leave the system. Depending on their arrival pattern and time required to service the incoming jobs, jobs might need to wait in the queue before getting the intended service.


Figure 1 above shows an abstract system that represents a simple queuing system. It has incoming job arrivals, which wait in a queue for processing. It also has a service center that performs the intended service. The completed jobs leave the system upon processing completion.

Imagine that you are monitoring the system for a finite time (T). During this monitoring period, you observe several job arrivals to the system (A), number of request completions (C), and the amount of time the service center is found busy (B) during the monitoring period. You can use these directly measurable quantities to derive the quantities below:

  • The arrival rate, represented by the symbol lambda (λ) equals the number of request arrivals (A) divided by the time (T).
    Stated algebraically: λ = A ÷ T  
  • The throughput, or completion rate, represented by the symbol (X), equals the number of request completions (C) divided by the time (T).
    Stated algebraically: X = C ÷ T
  • Utilization (U) equals the total amount of time the service center is found busy (B) divided by the time (T).
    Stated algebraically: U = B ÷ T
  • Mean service time (S) equals the total amount of time the service center is found busy (B) divided by the number of request completions (C).
    Stated algebraically: S = B ÷ C
  • If there is more than one visit made to the service center to complete the job, then another quantity, called service demand (D), can be derived. Service demand represents the total demand for the service center required to complete the entire job, before it leaves the system. It equals the total visits (Vk) made to the service center multiplied by its mean service time (S).  
    Stated algebraically: D = Vk × S                        

The operational laws of queuing theory

Operational laws are simple equations used as an abstract representation or model of the average behavior of the queuing system. These laws, very simple and generic in nature, are popular because they do not make any assumptions about system behavior. You can derive five key operational laws using the relationship between the quantities previously discussed:

  • The Utilization Law states that service center utilization (U) equals throughput of the service center (X) multiplied by the mean service time (S) at the service center.
    Stated algebraically: U = X × S                       
  • Little’s Law states that the average number of jobs in a queuing system (N) equals the arrival rate (λ) multiplied by the average time that a job spends in the system (R). Based on flow balance assumption, arrival rate  (λ) equals throughput (X) for a stable system. Hence, Little's Law can be represented as:
    N = λ × R 
    N = X × R
  •  The Interactive Response Time Law states that the average number of jobs in the system (N) equals throughput (X) multiplied by the average response time (R) plus think time (Z). Here, think time is the time the end user spends waiting (thinking or entering the data in the form fields available in the web page, etc.) between successive actions. You can express this as:
    N = X × (R + Z)
  • The Forced Flow Law states that throughput at service center k (Xk) equals visits to the service center (Vk) multiplied by the overall system throughput (X). In other words:
    Xk = Vk × X
  • The Throughput Bound Law states that the maximum achievable system throughput (Xmax) is equal to or less than 1 divided by the value that represents the service center with maximum service demand in the system (Dmax). Here's how to express it: 
    Xmax  < = 1 ÷ Dmax

These simple operational laws can be used during performance testing for addressing  challenges such as validating the performance non-functional requirement (NFR), validating the performance test results, validating workload models, calculating the think time, predicting the maximum achievable system throughput, estimating the server capacity requirement, building performance prediction model, and so on.

Here's a simple case study that shows how you can use the above discussed operational laws during performance testing to accelerate system scalability and capacity analysis.

Case in point: The three-tier web app example

A three-tier web application under test is expected to support 1,000 users meeting a throughput target of 100 transactions per second (TPS) and a response time SLA of 2 seconds to complete its 3 transactions.

The business projects a 2.5x times increase in throughput due to promotional activity for the next quarter (a validated projection). What is the think time to be configured in the test script to meet this NFR? How should you validate the load test result? What is the maximum achievable system throughput by the system? And what is the additional server capacity required to handle the increase in throughput?

You can apply operational laws to answer all of these questions.

Step 1: Think time calculation

In order to run a realistic load test, you must apply Little’s Law to calculate the think time to be configured in the script.

Little’s Law tells you that N = X × R, and the Interactive Response Time Law tells you that N = X × (R + Z), where Z represents think time. With basic math skills, you can convert the equation:

Z = (N ÷ X) – R

You know that N is 1,000 users, X is 100 TPS, R is 2 seconds for its three transactions, which will be six (3 × 2) seconds. When you plug in these values discover:

Z = (1,000 ÷ 100) – 6 = 4 seconds

So a think time of four seconds should be configured in the test script to carry out a realistic load test to meet the system performance target.

[ Special coverage: PerfGuild performance testing conference ]


Step 2: Load test results validation

In the table provided in figure 2, you find that the actual configured users used to carry out the load test (Nactual = 1,000 users); that the transaction response times for three transactions are 2.2 seconds, 1.8 seconds, and 2.4 seconds; and that the average throughput measured during the test ( X = 100 TPS). Using these test observations,  you can apply Little’s Law to identify the calculated number of users, Ncalculated,  by applying the Interactive Response Time Law equation: 
Ncalculated = X × (R1 + R2 + R3 + Z)

Hence, by plugging in the test result values, Ncalculated, you get 1,040 users.

Nactual and Ncalculated values are almost equal, which confirms that you have a valid load test and that your test observations are valid.

Performance testers should perform the additional calculation described in step three below and share it, along with the load test report.

Step 3: Calculation of maximum achievable system throughput

During a 1,000-user load test, assume the web server's average CPU utilization (UW-CPU) is 25%, the application server's average CPU utilization (UA-CPU) is 60%, and the database server's average CPU utilization (UDB-CPU) is 30%.

With web, application, and database server CPU utilization values available, you can do the following calculations:

For a service center k, you can apply the Utilization Law, as follows:
Uk = Xk × Sk 

By applying the Forced Flow Law ( Xk = Vk × X) on the above equation, you get: 
Uk = (Vk × X) × Sk

This equation can be regrouped as Uk =  X × (Vk × Sk). You already know that Vk × Sk represents the service demand, Dk.

Hence, the utilization of service center (k) equals the overall system throughput (X) multiplied by its service demand (Dk).

Utilization of service center: Uk =  X × DK

Therefore, to find service demand on the web, application, and database servers, you can apply the Utilization Law as shown below:

DW-CPU = UW-CPU ÷ X,  which is 0.25 ÷ 100 = 0.0025
A-CPU =  UA-CPU ÷ X, which is  0.6 ÷ 100      = 0.006
DB-CPU =  UDB-CPU ÷ X, which is 0.3 ÷ 100 = 0.003


You can then use the Throughput Bound Law (Xmax  < = 1 ÷ Dmax) to determine the maximum achievable overall system throughput. In this case, the service center with maximum service demand is the application server (DA-CPU). The overall system throughput will depend on the throughput of this slowest service center, which is the application server CPU resource.


Xmax < = 1 ÷ 0.006 = 167 TPS

In the above equation, the maximum utilization value of the service center is 100% (U = 1) when calculating the maximum achievable system throughput, Xmax. Since the CPU utilization threshold for the application server is 85% for the system under test, the maximum achievable system throughput can be recalculated as:

Xmax < = 0.85 ÷ 0.006 = 142 TPS

Hence, the system under test can support a maximum throughput of 142 TPS in the current test environment.

As you have seen, during the 1,000-user test, the system throughput measured using the performance test tool is 100 TPS. With the help of this load test result, you have predicted that the maximum achievable system throughput value will be 142 TPS. Because the application server has the maximum service demand, overall performance improvement is possible only by improving the application server CPU resource performance (either by doing CPU profiling for code optimization or through CPU capacity upgrade).

Step 4: Server capacity sizing analysis

In order to support the projected system throughput (X = 250 TPS) provided by the business, you can calculate the additional capacity required by the system with the Utilization Law.

The web server CPU service demand is 0.0025. By applying the Utilization Law (Uk =  X × Dk), you get:

UW-CPU = 250 × 0.0025 = 0.63

Hence, the web server CPU utilization is projected as 63% to support the target 250 TPS load, which confirms that the web server has enough capacity to handle the projected 250 TPS, considerably less than web server CPU utilization threshold value of 85%.

The database server CPU service demand is 0.003. By applying the Utilization Law (Uk =  X × Dk), you get:

UDB-CPU = 250 × 0.003 = 0.75

Hence, the database server CPU utilization is projected as 75% to support the target 250 TPS load, which confirms that the database server has just the required capacity to handle the projected 250 TPS, just under the database server CPU utilization threshold value of 80%.

The application server CPU service demand is 0.006. By applying the Utilization Law (Uk =  X × Dk), you get:

UA-CPU = 250 × 0.006  = 1.5

Hence, the application server CPU utilization is projected as 150% to support the target 250 TPS load, which confirms that the application server CPU requires more capacity to handle the projected 250 TPS. Since the application server CPU utilization threshold is 85%, the additional utilization capacity requirement will be 1.76 (1.5 ÷ 0.85).

The application server CPU requires approximately 1.76 times the current CPU capacity to handle the projected 250 TPS load. Based on this calculation, you can include an additional buffer and use the TPC or SPEC benchmarks to recommend the right server configuration to meet the projected load levels.

Next step: Performance modeling

These types of simple performance test validations, scalability analyses, and capacity extrapolation activities offer a good representation of system performance. These calculations can be done quickly using less effort and can alert performance testers as to the next steps for performance tuning.

This knowledge of operational laws is the first step toward performance modeling using analytical modeling techniques. Now you can start exploring queuing network, simulation, or statistical modeling techniques as ways of predicting system performance with greater accuracy.

But performance modeling activities aren't a replacement for conducting performance tests. Rather, use them to get a quick ballpark estimate or direction when you don't have the time or resources available to perform more sophisticated performance tests.

Go to TB LearnMobile APM: An introduction to mobile performance monitoring and testing strategies

To learn more about queuing theory and how the mathematical principles can be applied in Performance Testing, see my presentation at the PerfGuild online conference. Registered attendees also have full access to my presentation after the event.

Topics: Quality