Partitioning

Explore Partitioning, a core concept in Apache Cassandra that enables scalability by dividing and physically distributing a table's data across multiple cluster nodes.

Consistent hashing

Partitioning is a core feature of Cassandra that dictates how data is stored and queried. Cassandra utilizes consistent hashing to partition and distribute each table across multiple nodes. The output range of a hash function is treated as a ring or fixed circular space. Thus, Cassandra can be conceptualized as a giant hash ring, where all nodes are equal, and each node is responsible for storing a range or bucket of hashes.

Cassandra requires a primary key for each table. Part of the primary key is a partition key defined as “a single or multi-field value that determines data placement by consistent hash”. The partition key is used to distribute the table around the ring. Once a partition key is defined for a table, Cassandra automatically distributes data across nodes based on the value of the partition key column(s).

When a record is to be inserted in a table, its partition key is run through a consistent hashing function, resulting in a value that determines which bucket/hash-range the record belongs to, thus identifying the node responsible for saving the record.

For example, consider the courses table partitioned on the category column.

category

title

instructor

target_audience

Cassandra

Cassandra Fundamentals

DataJek

Beginner

Cassandra

CQL

Nancy

Intermediate

Cassandra

Cassandra Architecture

Adam

Advanced

Amazon DynamoDB

Introduction to DynamoDB

DataJek

Beginner

Amazon DynamoDB

Amazon DynamoDB Basics

Bob

Beginner

Google Bigtable

How Google Bigtable works

Adam

Intermediate

The diagram below demonstrates the partitioning and distribution of the table’s records among cluster nodes. The partitioner computes a token for each partition ...