System Design Deep Dive: Real-World Distributed Systems/

...

Partitioning and Replication in DynamoDB

Learn how tables are partitioned and replicated in DynamoDB.

We'll cover the following...

Partitioning
Replication
- Consistency
Quiz
What's next?

The reason why we chose a NoSQL schema is that it allows for easy partitioning of tables. Let's quickly refresh our understanding of partition and learn how we will partition our tables. We will conclude this lesson by understanding how our design replicates partitions.

Partitioning

As the name suggests, partitioning means dividing the database or a table and storing it in multiple nodes. This concept is also known as sharding. It is important to note that partitioning can be of two types, vertical and horizontal. We will briefly revisit both types in this lesson. Then, we will discuss how we will partition our design.

The purpose of partitioning is to distribute the load of read and write requests on several nodes. There are two ways to achieve this.

Vertical

Vertical partitioning is the splitting of a table by columns. The illustration below demonstrates vertical sharding.

Press + to interact

In the example above, we have partitioned a table into two tables. Note how both tables have the same primary key.

Horizontal

Horizontal partitioning is the partitioning of rows in a table. This is useful for large tables since it allows us to partition a table with many rows; the partitions of the table will have fewer rows. Different from vertical partitioning, all partitions will have the same number of rows. There are better ways to partition when the number of rows is expected to be large that is partitioning horizontally. Read-write access to a large table stored on a single server is limited to the throughput capabilities of that server. If we split the entries in the table equally and store them on two different servers, the same table will have higher availability. The illustration below demonstrates horizontal partitioning.

Press + to interact

Here, we can see that the resultant tables have the same schema. We've only split the entries in the original table into two tables with the same schema.

We will use horizontal sharding because our data will not have a fixed schema. Furthermore, another reason to choose horizontal over vertical partitioning is that the former is better for our design. We are expecting a vast number of rows in our tables, and we wish to distribute the throughput of our nodes across those entries. Usually, automating horizontal sharding is much easier as compared to vertical sharding—achieving a fair distribution of throughput among partitions with vertical sharding requires knowing how frequently columns are accessed.

Note: For a detailed explanation of partitioning, visit Data Partitioning lesson.

Our partitioning

We will horizontally divide every table in our database into partitions. This will help us cater to the different throughput and storage requirements of partitions. We can think of a partition as an allocation of storage that is backed by SSDs. Every partition will host a disjoint part of the table's key range. Our design will increase the number of partitions of a table in the following scenarios: ...

Prologue

File Systems

Google File System (GFS)

Google Colossus File System

Facebook's Tectonic File System

Databases

Google Bigtable

Google Megastore

Google Spanner

Key-value Stores

Many-core Key-value Store

Scaling Memcache

SILT

Amazon DynamoDB

Concurrency Management

Two-phase Locking (2PL)

Google Chubby Locking Service

ZooKeeper

Big Data Processing: Batch to Stream Processing

MapReduce

Spark

Kafka

Consensus

Understanding Consensus: Two Generals, FLP, & Byzantine Generals

Two-phase Commit

State Machine Replication

Paxos

Raft

Epilogue

Partitioning and Replication in DynamoDB

Partitioning

Vertical

Horizontal

Our partitioning