Ensure Scalability and Replication
Learn how consistent hashing enables scalability and how we replicate such partitioned data.
Add scalability
Let’s start with one of the core design requirements: scalability. We store key-value data in storage nodes. With a change in demand, we might need to add or remove storage nodes. It means we need to partition data over the nodes in the system to distribute the load across all nodes.
For example, let’s consider that we have four nodes, and we want 25% of the requests to go to each node to balance the load equally. The traditional way to solve this is through the modulus operator. Each request that arrives has a key associated with it. When a request comes in, we calculate the hash of the key. Then, we find the remainder by taking the modulus of the hashed value with the number of nodes m
. The remainder value x
is the node number, and we send the request to that node to process it.
The following slides explain this process:
We want to add and remove nodes with minimal change in our infrastructure. But in this method, when we add or remove a node, we need to move a lot of keys. This is inefficient. For example, node 2 is removed, and suppose for the same key, the new server to process a request will be node 1 because . Nodes hold information in their local caches, like keys and their values. So, we need to move that request’s data to the next node that has to process the request. But this replication can be costly and can cause high latency.
Next, we’ll learn how to copy data efficiently.
Point to Ponder
Why didn’t we use load balancers to distribute the requests to all nodes?
Consistent hashing
Consistent hashing is an effective way to manage the load over the set of nodes. In consistent hashing, we consider that we have a conceptual ring of hashes from ...