Being able to manage distributed groups enables the utilization of several distributed algorithms. One of them is named consistent hash rings.

A traditional way to spread keys across a distributed system, which is composed of n nodes, is to compute which node should be responsible for a key by using: hash( object) % n. Unfortunately, as soon as n changes, the result of the modulo operation changes and therefore all keys are remapped to a new node. This remapping causes a massive shuffling of data or processing in a cluster.

The point of consistent hashing is to avoid that. By using a different method of computing, when n changes, only K /n of the keys are remapped to the remaining nodes, where K is the number of keys and n the number of nodes handling those keys.

The hash ring

A hash ring is a hashing space that wraps around itself to form a circle. That’s why it is called a ring. Every key computed using the consistent hashing function maps somewhere on this hash space. That means that a key is always in the same place on the ring. The ring is then split into P partitions, where P is a magnitude larger than the number of nodes (a lot more partitions than the nodes). Each node is then responsible for 1/n partitions of the ring.

This implementation also has the upside of making it easy to add a replication mechanism, meaning a set of keys managed by more than just one node. Replication is handy in case of the failure of a node, as the keys are still managed/stored by another node.

Get hands-on with 1400+ tech skills courses.