Micro Focus is now part of OpenText. Learn more >

You are here

You are here

How to handle API rate limits: Do your integrations work at scale?

Vinod Chandru Chief Technical Officer and Co-Founder, Kloudless

Have you ever gone to a public swimming pool and noticed a sign stating the maximum occupancy limit? Those limits were put in place to ensure public safety. APIs use a similar criterion, called a "rate limit," to ensure the safety of the API’s consumers and the API itself.

They can protect you against slow performance and denial-of-service (DoS) attacks, allow for scalability, and improve the overall user experience.

You need rate limits because, at the end of the day, you can't provide your users with the best possible experience if your API isn't functioning properly. Here's how to put rate limits to work.

Why you need rate limits

Rate limits exist to govern a single user or entity that will consume the API's data in order to ensure the health and accessibility of your API. Think of rate limiting as a form of security.

If your API becomes overloaded, its performance suffers. Rate limits protect against that by curtailing the number of requests that come into your server. If your API is the target of a malicious DoS attack, for example, it can go down entirely. Rate limiting allows API developers to ensure an API will reject requests that exceed a set limit.

Rate limits also greatly help with scalability. As application developers, we dream of our product gaining popularity quickly and garnering an influx of users. But that influx can cause spikes in traffic, causing our APIs to slow to a crawl. Rate limiting can make sure that your API is equipped to handle the incoming horde of potential users.

Under the hood: How rate limits work

Rate limits act as gatekeepers to control the amount of incoming or outgoing traffic to or from a network. An API rate limit might enforce, say, 100 requests per minute. Once requests exceed that number, it generates an error message to alert the requester that it exceeded the number of allotted requests in a specific time frame.

With HTTP API requests, this error usually manifests itself as a response with a 429 status code. RFC 6585 sets this status code aside to represent "Too Many Requests." Usually, the server sends the response along with additional information about the allowed request rate to the requester, as well as a header to indicate the amount of time necessary to wait until trying another request.

This header is typically called "Retry-After," keeping with the best practices described by the RFC spec. While not strictly necessary, it is a good protocol to follow to keep users aware of the requirements of the network.

Three types of rate limits

Rate limits consist of various parameters that define the extent of governance. While anyone can come up with a custom rate-limiting protocol for an API, developers often implement three distinct types of rate limits. You can implement those parameters to control key facets of your rate limiting policy. 

Dev teams can implement a single rate limiting type, or any combination of the three, depending on the importance they place on each of the factors described below.

User rate limiting

The most common type of rate limiting, user rate limiting monitors a user's API key, session cookie, and IP address to watch for the number of requests being made. If the number of requests exceeds the limit, the user must wait until the time frame resets, which is usually indicated by an amount of time to wait sent along via a message attached to the "Retry-After" header.

In some cases, users can work out a way with developers to get their limit increased or their "Retry-After" time-frame reset, allowing them access to the network without having to wait.

Time-based rate limiting

This is usually based on the region and time of day that the user is attempting to access a network. It exists to ensure that the strict rate limiting protocols apply only to certain periods of time, when traffic will be the highest. Often this involves increasing the number of requests allowed between the hours of 12 am and 8 am, since traffic tends to be at its lowest overall in that time period.

Server rate limiting 

Depending on the size of the API, you may have multiple servers handling different types of requests. Server rate limiting is the process of enforcing different limits on a server-by-server basis.

One example of this is an image processing service that consumes a lot of CPU cycles. The server handling the processing would be rate limited at a higher level than would a normal web server, so that API requests sent to the processing server get throttled more quickly, to be fair to all users.

This kind of rate limiting can also decrease request limits for other, less accessed servers to free up available network traffic for the server that generates more API requests.

How to implement rate limiting

You can configure rate limiting on your APIs in many ways. If you don't mind spending a considerable amount of effort, you can implement rate limiting at the application level, but this is an arduous and lengthy process. If you want to use some right-out-of-the-box tools, however, there exist lots of easy-to-implement tool sets and frameworks with rate limiting capabilities built right in.

Some monitoring and management tools offer robust rate-limiting capabilities using something known as a "leaky bucket algorithm." This analogy to a bucket with holes in the bottom sees the water poured into the bucket as requests coming in; only a certain amount can leak from the holes at the bottom in a certain amount of time.

This process adheres to a first-in-first-out (FIFO) scheduling algorithm, since the requests that came in first are processed before the requests that followed them in the queue. The water that leaks out of the holes at the bottom represents the requests that the server processes. When requests spike, they are stored in a temporary backlog to process at a steady rate, within the limits of the bucket.

If the water (requests) coming in exceeds the limit that the bucket can hold and it overflows, then the water is discarded and ignored.

Another popular, and dead simple, means of implementing rate limiting is through using an API gateway. With rate limiting tools built right into these services, setting up your API’s rate limit protocol is about as simple as setting up a configuration file.

A major benefit to using API gateways is their tool's ability to switch between limiting an authentication layer or client ID, depending on whether the former is available to monitor.

Even some frameworks include basic rate limiting functionality. These frameworks can specify a rate-limit parameter directly on a group of routes and indicate the models that you wish them associated with, effectively making rate limiting an afterthought.

So far I've covered how rate limits govern the usage of APIs from both the consumer and the server side. But there are other technologies that exist between these two sides that must abide by these rules as well.

Unified rate limiting

Unified APIs provide wonderful breakthrough technology for those looking to build connections with multiple cloud service providers in a single go. In the time it would take to build out a single API integration, unified APIs can integrate with dozens or even hundreds of services. Unified APIs also provide the added benefit of eliminating future maintenance to said connections. 

A unified API works by abstracting out the differences between different API services that fall into a certain category and providing a set of unified endpoints to access all of them in the same set of routes. As a result, unified APIs face the added task of deciding how to handle rate limiting when it comes to delivering a 429 status code.

The different APIs with which a unified API makes connections all carry different rate limits and have standards in place to deal with request limits, so you must handle them accordingly to relay the data back to the users consuming it.

Tactics to consider for unified APIs

While there are no standardized best practices to apply when handling rate limits in a unified API, there are a handful of tactics that you can use to accommodate the API's users as best as possible.

Typically, rate-limit algorithms track the number of requests over a short period of time, such as one second or one minute. If requests exceed the threshold, you'll commonly see error responses with the 429 status code. This includes the "Retry-After" header.

Unified APIs should take steps to unify the Retry-After header that's returned by way of each API, and then back off and retry requests for up to 30 seconds, based on each specific API's recommendations and best practices. By automating this process, a unified API takes much of the manual work out of the user's hands, streamlining the process of consuming the API.

Quotas and unified APIs

Rate limits generally go a good job of handling spikes in traffic over short time intervals. However, an API sometimes also needs to regulate the total number of requests over much longer intervals, such as an hour, a day, or a month. In these scenarios, the API effectively provides a usage quota over the specified time period.

Quotas complement rate limits by allowing you to set them higher. Otherwise, the API service might not sustain a constant level of requests near the rate-limit threshold from an ever-increasing number of applications.

By providing a quota, you permit applications to occasionally burst to high levels of usage but don't allow them to maintain that level. Quotas tend to pose more of an issue for unified APIs, depending on what the API you're accessing has set as its criteria for request cutoff.

Implementing quotas

Some APIs set a quota on the tenant. Salesforce, for instance, caps the number of requests per customer based on the customer's Salesforce edition and number of licenses. Salesforce places this cap on the tenant's use of API requests, rather than on a specific developer application.

Because of this, a misbehaving application that exhausts a tenant's daily quota could cause all of the tenant's other integrations to temporarily fail as well. This widens the impact to include a customer's usage of the API provider itself, rather than just affecting a single developer application.

You can place another type of quota on the developer application itself. Some APIs, like Google Drive, limit the overall number of API requests an application can perform across all users who possess authorized access to that application. This begins to be a concern as the application gains larger adoption, and with more users authorizing access to their data.

Unified APIs should proactively reach out to these APIs to request an increase to the limit if justified, or let developers know to request a higher rate limit if needed.

"Quota by authorized user" is the most flexible of those mentioned so far, and requires no extra care on the part of the unified API. This type of quota is similar to a rate limit but operates over a longer time interval.

For example, the Egnyte API defaults to a limit of 1,000 requests per authorized user per day. This is in addition to its rate limit of two requests per second. Since these requests are set on the basis of an individual user, running into the rate limit for a user does not affect an application's ability to make requests to other users' accounts.

Set your API rate limit strategy

Rate limits, and quotas, can prove to be a thorn in the side of developers and consumers of APIs, but they exist for important reasons. An API is protected against multiple factors that can threaten its viability by enforcing strict rate limiting— namely downtime and harmful security threats.

Whether your application is consuming or providing data, make sure that you take the time to craft a well-thought-out strategy for implementing or dealing with rate limits. Your users will be all the better for it.

Keep learning

Read more articles about: App Dev & TestingTesting