Partitioning
Explore Partitioning, a core concept in Apache Cassandra that enables scalability by dividing and physically distributing a table's data across multiple cluster nodes.
Consistent hashing
Partitioning is a core feature of Cassandra that dictates how data is stored and queried. Cassandra utilizes consistent hashing to partition and distribute each table across multiple nodes. The output range of a hash function is treated as a ring or fixed circular space. Thus, Cassandra can be conceptualized as a giant hash ring, where all nodes are equal, and each node is responsible for storing a range or bucket of hashes.
Cassandra requires a primary key for each table. Part of the primary key is a partition key defined as “a single or multi-field value that determines data placement by consistent hash”. The partition key is used to distribute the table around the ring. Once a partition key is defined for a table, Cassandra automatically distributes data across nodes based on the value of the partition key column(s).
When a record is to be inserted in a table, its partition key is run through a consistent hashing function, resulting in a value that determines which bucket/hash-range the record belongs to, thus identifying the node responsible for saving the record.
For example, consider the courses
table partitioned on the category
column.
Get hands-on with 1300+ tech skills courses.