Dynamic Horizontal Scaling

Learn how to dynamically adjust the capacity of an application based on the incoming or predicted traffic.

One important advantage of modern cloud-based infrastructure is the ability to dynamically adjust the capacity of an application based on the current or predicted traffic. This is also known as dynamic scaling. If implemented properly, this practice can reduce the cost of the IT infrastructure enormously while still keeping the application highly available and responsive.

The idea is simple: if our application is experiencing a performance degradation caused by a peak in traffic, the system automatically spawns new servers to cope with the increased load. Similarly, if we see that the allocated resources are underutilized, we can shut some servers down to reduce the cost of the running infrastructure. We can also decide to perform scaling operations based on a schedule; for instance, we can shut down some servers during certain hours of the day when we know that the traffic will be lighter, and restart them again just before the peak hours. These mechanisms require the load balancer to always be up-to-date with the current network topology, knowing which server is up at any time.

Using a service registry

A common pattern to solve this problem is to use a central repository called a service registry, which keeps track of the running servers and the services they provide.

The illustration below shows a multiservice architecture with a load balancer on the front, configured dynamically using a service registry.

Get hands-on with 1300+ tech skills courses.