...

/

Dynamic Horizontal Scaling

Dynamic Horizontal Scaling

Learn how to dynamically adjust the capacity of an application based on the incoming or predicted traffic.

One important advantage of modern cloud-based infrastructure is the ability to dynamically adjust the capacity of an application based on the current or predicted traffic. This is also known as dynamic scaling. If implemented properly, this practice can reduce the cost of the IT infrastructure enormously while still keeping the application highly available and responsive.

The idea is simple: if our application is experiencing a performance degradation caused by a peak in traffic, the system automatically spawns new servers to cope with the increased load. Similarly, if we see that the allocated resources are underutilized, we can shut some servers down to reduce the cost of the running infrastructure. We can also decide to perform scaling operations based on a schedule; for instance, we can shut down some servers during certain hours of the day when we know that the traffic will be lighter, and restart them again just before the peak hours. These mechanisms require the load balancer to always be up-to-date with the current network topology, knowing which server is up at any time.

Using a service registry

A common pattern to solve this problem is to use a central repository called a service registry, which keeps track of the running servers and the services they provide.

The illustration below shows a multiservice architecture with a load balancer on the front, configured dynamically using a service registry.

Press + to interact
A multiservice architecture with a load balancer on the front, configured dynamically using a service registry
A multiservice architecture with a load balancer on the front, configured dynamically using a service registry

The architecture in the illustration above assumes the presence of two services, API and WebApp. There can be one or many instances of each service, spread across multiple servers.

When a request to example.com is received, the load balancer checks the prefix of the request path. If it’s the /api prefix, the request is load balanced between the available instances of the API service. In the illustration above, we have two instances running on the api1.example.com server and one instance running on the api2.example.com server. For all the other path prefixes, the request is load balanced between the available instances of the WebApp service. In the illustration, we have only one WebApp instance, which is running on ...