What Is Load Balancing?

We talked about load balancing briefly at the beginning of this course, but now it's time to take a deeper look.

The broadest definition of load balancing could be formulated like this: load balancing is the process of distributing a set of tasks over a set of resources.

Regarding web applications, a more narrow definition might be more useful: load balancing distributes traffic or requests over a set of web servers.

Why do we need load balancing?

For applications with a small number of users, we might not even need load balancing. In other cases, just using a bigger server (vertical scaling) might be sufficient, especially if we are thinking about a small number of users (e.g., company's internal applications). But as soon as we create applications for the public, these scaling options are exhausted quickly.

We could just give different domains or IP addresses to different users to spread the load between servers, but this is obviously not very user friendly and is error prone. That’s where load balancing shines. A load balancer offers a single point of entry, e.g., a domain name or an IP address, for all users but distributes the traffic transparently to back-end servers (called targets).

The user doesn’t even know which server they're is talking to, and it also doesn’t matter from a user perspective. The diagram below shows a typical setup with a load balancer.

Get hands-on with 1400+ tech skills courses.