Partitioning

Explore Partitioning, a core concept in Apache Cassandra that enables scalability by dividing and physically distributing a table's data across multiple cluster nodes.

We'll cover the following...

Consistent hashing
Partitioning related nodetool commands
Hands-on experience

Consistent hashing

Partitioning is a core feature of Cassandra that dictates how data is stored and queried. Cassandra utilizes consistent hashing to partition and distribute each table across multiple nodes. The output range of a hash function is treated as a ring or fixed circular space. Thus, Cassandra can be conceptualized as a giant hash ring, where all nodes are equal, and each node is responsible for storing a range or bucket of hashes.

Cassandra requires a primary key for each table. Part of the primary key is a partition key defined as “a single or multi-field value that determines data placement by consistent hash”. The partition key is used to distribute the table around the ring. Once a partition key is defined for a table, Cassandra automatically distributes data across nodes based on the value of the partition key column(s).

When a record is to be inserted in a table, its partition key is run through a consistent hashing function, resulting in a value that determines which bucket/hash-range the record belongs to, thus identifying the node responsible for saving the record.

For example, consider the courses table partitioned on the category column.

category	title	instructor	target_audience
Cassandra	Cassandra Fundamentals	DataJek	Beginner
Cassandra	CQL	Nancy	Intermediate
Cassandra	Cassandra Architecture	Adam	Advanced
Amazon DynamoDB	Introduction to DynamoDB	DataJek	Beginner
Amazon DynamoDB	Amazon DynamoDB Basics	Bob	Beginner
Google Bigtable	How Google Bigtable works	Adam	Intermediate

Getting Started

Apache Cassandra Overview

Apache Cassandra Architecture

Apache Cassandra Data Modeling

Apache Cassandra Table

Apache Cassandra Data Types

Tunable Consistency

Apache Cassandra Read and Write Path

Wrap Up

Partitioning

Consistent hashing