Introduction

Rebalancing in a distributed system is data movement between multiple host instances. In the context of database partitioning, rebalancing means the movement of partitions between multiple host instances.

These are scenarios where a distributed database requires rebalancing:

  • If an existing host instance crashes, the database must migrate partitions from the existing host to other hosts.

  • If a new host instance joins the cluster, the database must reassign partitions to the new host from the existing ones for uniform distribution.

  • Scaling up host instances by adding more CPU, memory, etc., requires redistributing partitions between the hosts.

  • The database must redistribute partitions between host instances as the query throughput and dataset increase over time.

Prerequisites

There are certain prerequisites for rebalancing partitions in a distributed database:

  • After the rebalance operation, the data storage and read/write distribution on the partitions should be uniform between the host instances.

  • During the rebalance process, the database should continue processing read and write requests on the existing partitions.

  • The rebalance process should involve minimal data movement between the host instances to reduce the network and I/O load.

Strategies for rebalancing

  • Fixed partitions

  • Dynamic partitioning based on the dataset

  • Dynamic partitioning based on the host instance

Fixed number of partitions

In this strategy, the database administrator preconfigures the number of partitions for the database. Then, these partitions are distributed uniformly among the host instances of the database:

  • The database removes a few partitions from the existing host instances if we add a new host, instance and assigns them to a new host ensuring uniform distribution of partitions across the hosts.

  • If an existing host instance crashes, the database redistributes the partitions from the crashed instance and assigns them to live host instances, ensuring uniform distribution of partitions across the hosts.

In this strategy, the database moves only entire partitions between the host instances. The number of partitions and the assignment of data to partitions doesn't change.

Get hands-on with 1300+ tech skills courses.