Partitioning Techniques

Learn different partition techniques for a distributed system.

How we partition our data in a distributed system matters significantly, so based on the data we have, we must choose the correct partitioning strategy. Otherwise, things get complicated in the long run.

Good partitioning technique results in even distribution of data among the nodes. If we had nn nodes and we scale out to 2n2n nodes, a good partitioning strategy will help us to gain two times the performance from our system.

Before discussing partitioning techniques, let’s first assume that the data we have has some keys for each row. Each key identifies a row uniquely. This is more or less a common expectation in all different databases.

Let’s discuss a few techniques.

Range-based partitioning

In range-based partitioning, data is partitioned based on the ranges of the key. For example, say we have keys of type strings. Now the data with keys starting with ‘a’ to ‘j’ is stored in node 1, ‘k’ to ‘p’ in node 2, and ‘q’ to ‘z’ in node 3.

Press + to interact
A simple demonstration of range-based partitioning
A simple demonstration of range-based partitioning

This technique is very simple. However, in many cases range-based partitioning results in storing more data in a few nodes and a low volume of data in other nodes. In the above example, the partition ‘k’ to ‘p’ ...