System Design Deep Dive: Real-World Distributed Systems/

...

Detailed Design of Bigtable: Part II

Learn how to perform different operations on tablets in Bigtable.

We'll cover the following...

How to locate the tablets
How to assign the tablets
Tablet serving
- Write operation
- Read operation
Compactions

Many tables are kept in a Bigtable cluster. A table in Bigtable is made up of tablets, each of which stores all the data related to a specific range of rows. Each table starts out with only one tablet. As a table expands, it is automatically divided into many tablets, each of which has a standard size of between 100 and 200 MB. Let’s look at how we can locate and assign tablets and how read/write works in Bigtable.

How to locate the tablets

As tablets can migrate from server to server due to load balancing, tablet server failures, and so on, how do we figure out the right tablet server given a row? To find the answer to this question, we must locate the tablet whose row range includes the target row. To save tablet location data, Bigtable uses a three-level structure similar to that of a B+ treeBplustree.

In those three levels, the root tablet’s location is stored in a Chubby file at the first tier.
The second level contains all Metadata tablets.
The third level contains all user tablets.
The root tablet has a unique metadata table that records the position of all other tablets.
The first tablet in the Metadata table is the root tablet, which is treated differently from other tablets. To ensure that the tablet location hierarchy does not surpass the three tiers, the root tablet is never divided.
The Metadata table stores the information in the following way:
- The position of a tablet is stored in the Metadata table under a row key that encodes the tablet’s table identity and end row (the end row helps in identifying the start of the next tablet’s information). Every Metadata row holds about one KB of information in memory. The three-level hierarchy method can handle $2^{34}$

...

Prologue

File Systems

Google File System (GFS)

Google Colossus File System

Facebook's Tectonic File System

Databases

Google Bigtable

Google Megastore

Google Spanner

Key-value Stores

Many-core Key-value Store

Scaling Memcache

SILT

Amazon DynamoDB

Concurrency Management

Two-phase Locking (2PL)

Google Chubby Locking Service

ZooKeeper

Big Data Processing: Batch to Stream Processing

MapReduce

Spark

Kafka

Consensus

Understanding Consensus: Two Generals, FLP, & Byzantine Generals

Two-phase Commit

State Machine Replication

Paxos

Raft

Epilogue

Detailed Design of Bigtable: Part II

How to locate the tablets