Search⌘ K
AI Features

Cassandra's Data Model

Explore Cassandra's data model, including keyspaces, tables, schemas, and primary keys. Understand how partition keys distribute data, how clustering columns organize rows, and how consistent hashing with virtual nodes ensures balanced partition replication across nodes. Learn about Cassandra's design goals focusing on high availability, write performance, and scalability, as well as its storage engine inspired by Bigtable.

Cassandra is a distributed datastore that combines ideas from the DynamoG. DeCandia et al., “Dynamo: Amazon’s Highly Available Key-value Store,” in Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, 2007. and the BigtableF. Chang et al., “Bigtable: A Distributed Storage System for Structured Data,” in Proceedings of 7th {USENIX} Symposium on Operating Systems Design and Implementation (OSDI), 2006. paper.

Note: Besides Dynamo there is also a separate distributed system, called DynamoDB. This is commercially available, but details around its internal architecture have not been shared publicly yet. However, this system has a lot of similarities with Cassandra, such as the data model and tunable consistency.

CassandraA. Lakshman and P. Malik, “Cassandra — A Decentralized Structured Storage System,” Operating Systems Review, 2010. was originally developed by Facebook, but it was then open-sourced and became an Apache project.During this period, it has evolved significantly from its original implementation.

Note: The information in this chapter refers to the state of this project at the time of writing this course.

Design goals of Cassandra

The main design goals of Cassandra are:

  • Extremely high availability
  • Performance (high
...