Data Partitioning

Learn about data partitioning in a distributed system.

When we discussed replication, we assumed that the My Cool App had such an amount of data that it was always possible to store the whole copy of the app in a single machine. But in reality, as a system grows, it is no longer feasible for all the data to be stored in a single machine. Businesses grow with time, new use-cases show up, data become an asset, and obviously, data size increases.

Press + to interact

Apart from data size, sometimes supporting a large number of queries becomes pretty complicated if all the data is stored in one machine.

This is where partitioning is used.

What is partitioning?

Partitioning is a mechanism in which data is divided into smaller chunks based on some specific attributes. These chunks are called partitions.

One partition is independent of another partition. Two partitions can be stored in two different machines in a distributed system. A partition can also be treated as a standalone database table.

Note that another well-known term for this concept is called sharding ...