Partitioning Techniques
Learn different partition techniques for a distributed system.
We'll cover the following...
How we partition our data in a distributed system matters significantly, so based on the data we have, we must choose the correct partitioning strategy. Otherwise, things get complicated in the long run.
Good partitioning technique results in even distribution of data among the nodes. If we had nodes and we scale out to nodes, a good partitioning strategy will help us to gain two times the performance from our system.
Before discussing partitioning techniques, let’s first assume that the data we have has some keys for each row. Each key identifies a row uniquely. This is more or less a common expectation in all different databases.
Let’s discuss a few techniques.
Range-based partitioning
In range-based partitioning, data is partitioned based on the ranges of the key. For example, say we have keys of type strings. Now the data with keys starting with ‘a’ to ‘j’ is stored in node 1, ‘k’ to ‘p’ in node 2, and ‘q’ to ‘z’ in node 3.
This technique is very simple. However, in many cases range-based partitioning results in storing more data in a few nodes and a low volume of data in other nodes. In the above example, the partition ‘k’ to ‘p’ ...