Ensure Scalability and Replication
Explore methods to ensure key-value store scalability and high availability. Implement consistent hashing to partition data efficiently, using virtual nodes to distribute load uniformly and prevent hotspots. Define peer-to-peer replication strategies to achieve durability across multiple storage nodes.
Add scalability
Scalability requires distributing data across multiple storage nodes. As demand changes, we must dynamically add or remove nodes. To achieve this, we partition data to balance the load across the system.
A traditional partitioning method uses the modulus operator. For a system with 4 nodes, we want 25% of requests to go to each node. When a request arrives, we hash its key and compute the remainder modulo m. The result x (calculated as hash % m) determines which node processes the request.
The following slides explain this process:
We want to scale infrastructure with minimal disruption. However, modular hashing is inefficient for dynamic scaling. Adding or removing a node changes the divisor m, which alters the mapping for nearly all keys.
For example, if node 2 is removed, a key previously mapped to it might shift to node 1 because
Next, we will examine how to distribute data efficiently. ...