Storage Structure

Learn the fundamental terminology related to data modeling, crucial for effectively designing and organizing data in Apache Cassandra.

The following set of terms describes Apache Cassandra’s storage structure hierarchy.

Data model

An abstract model for organizing elements of data, i.e., how the database gathers, stores, and uses the data. In Apache Cassandra, the data model is based on the queries to be performed on the database.

Cluster

The cluster is the outermost structure in Apache Cassandra. Apache Cassandra is a distributed database that spreads data on multiple machines while appearing to the end user as a single instance. Data is distributed on instances called nodes, logically arranged in a datacenter/ring. 

Keyspace

A keyspace is a container for all tables belonging to an application. It is similar to a relational database schema and provides an outermost grouping of data. Apache Cassandra defines Replication at the keyspace level.

Table

An Apache Cassandra table is a collection of rows and columns partitioned and stored across the cluster based on part of the primary key. In the table definition, Cassandra requires a data type for each column. Tables are created to satisfy one or more queries. As Cassandra does not support joins, tables are denormalized and contain all the data required for the query.

Partition

Being a distributed database, row(s) of an Apache Cassandra table are distributed around the cluster based on the partition key specified in the table definition. A partition is a row or a group of rows with the same partition key. A partition can be visualized as a chunk of the table’s rows, residing on a particular node.

Partition key

The partition key is part of the table’s primary key which dictates the node on which the partition will be stored.

Get hands-on with 1300+ tech skills courses.