...

Design a Rate Limiter

Track tokens, throttle spikes, and keep latency low. System Design Interview Fast-Track.

We'll cover the following...

Requirements

Our focus in this section is to design a rate limiter with the following functional and non-functional requirements.

To limit the number of requests a client can send to an API within a time window.
To make the limit of requests per window configurable.
To make sure that the client gets a message (error or notification) whenever the defined threshold is crossed within a single server or combination of servers.

Availability: Essentially, the rate limiter protects our system. Therefore, it should be highly available.
Low latency: Because all API requests pass through the rate limiter, it should work with a minimum latency without affecting the user experience.
Scalability: Our design should be highly scalable. It should be able to rate limit an increasing number of clients’ requests over time.

Types of throttling

A rate limiter can perform three types of throttling.

Hard throttling: This type of throttling puts a hard limit on the number of API requests. So, whenever a request exceeds the limit, it’s discarded.
Soft throttling: Under soft throttling, the number of requests can exceed the predefined limit by a certain percentage. For example, if our system has a predefined limit of 500 messages per minute with a 5% exceed in the limit, we can let the client send 525 requests per minute.
Elastic or dynamic throttling: In this throttling, the number of requests can cross the predefined limit if the system has excess resources available. However, there is no specific percentage defined for the upper limit. For example, if our system allows 500 requests per minute, it can let the user send more than 500 requests when free resources are available.

There are three different ways to place the rate limiter.

On the client side: It’s easy to place the rate limiter on the client side. However, this strategy is not safe because it can easily be tampered with by malicious activity. Moreover, the configuration on the client side is also difficult to apply in this approach.
On the server side: As shown in the following figure, the rate limiter can be placed on the server side. In this approach, a server receives a request that is passed through the rate limiter that resides on the server.