...
/Database Buckets and Data Model of Spanner
Database Buckets and Data Model of Spanner
Learn how we can achieve higher performance by utilizing data locality.
We'll cover the following...
Buckets and placement
Spanner provides an additional layer of abstraction over the bag of key-value mappings in the form of a directory or bucket—a group of adjacent keys that all begin with the same prefix. Applications that support buckets manage the data locality by carefully selecting keys.
The basic organizational structure for data is a bucket. All the bucket's data share the same replication settings. Consider the illustration below. The data is transferred between Paxos groups bucket by bucket. To minimize the load on a Paxos group, we can relocate the frequently accessed buckets into the same Paxos group or place a bucket geographically closer to its accessors. Changing a bucket's location doesn't have to interrupt service for the client. Normally, copying 50 MB of data to a new bucket would take a few seconds.
Given that a Paxos group may have several buckets, the tablet in Spanner and Bigtable differs in a way that the Spanner tablet does not need to be a single and lexicographically contiguous partition of the row space. A Spanner tablet is an enclosure containing many row-ranges. It allows co-locating numerous frequently used buckets together.
Relocating buckets
Spanner uses the movedir
function to relocate buckets across Paxos groups. The movedir
also adds or removes replicas from Paxos groups. We don't implement movedir
as a single transaction to prevent a large data move from stalling ongoing reads and writes. Instead, it keeps track of when it has started moving and moves the data in the background. After all data (except a small amount of data) has been moved, the remaining small quantity will be moved in one atomic operation while the metadata for the two Paxos groups is updated.
A bucket is the smallest unit for which an ...