...

>

Ensure Scalability and Replication

Ensure Scalability and Replication

Explore methods to ensure key-value store scalability and high availability. Implement consistent hashing to partition data efficiently, using virtual nodes to distribute load uniformly and prevent hotspots. Define peer-to-peer replication strategies to achieve durability across multiple storage nodes.

Add scalability

Scalability requires distributing data across multiple storage nodes. As demand changes, we must dynamically add or remove nodes. To achieve this, we partition data to balance the load across the system.

A traditional partitioning method uses the modulus operator. For a system with 4 nodes, we want 25% of requests to go to each node. When a request arrives, we hash its key and compute the remainder modulo m. The result x (calculated as hash % m) determines which node processes the request.

The following slides explain this process:

We get the hash of the key and take modulus with the number of nodes to find the node that should process the request
1 / 3
We get the hash of the key and take modulus with the number of nodes to find the node that should process the request

We want to scale infrastructure with minimal disruption. However, modular hashing is inefficient for dynamic scaling. Adding or removing a node changes the divisor m, which alters the mapping for nearly all keys.

For example, if node 2 is removed, a key previously mapped to it might shift to node 1 because 10%3=110 \% 3 = 1. Since nodes cache data locally, this shift forces massive data migration (reshuffling) to the new target nodes, causing high latency and network congestion.

Next, we will examine how to distribute data efficiently. ...