Concurrency Control in Spanner
Learn how concurrency control is done in Spanner.
In this lesson, we’ll learn how TrueTime guarantees correctness properties around
1.
2. Reads from the past that do not block any ongoing transaction
3. Read-only transactions that do not require any lock
These features make it possible to ensure that at any time, t
, the full database audit read will see the exact results of all transactions that are committed as of t
.
We will use Paxos write and Spanner clients write to differentiate between both writes.
Timestamp management
The implementation of Spanner supports reads from a snapshot and read-only and read-write transactions. The system implements the read-write transactions as standalone writes, whereas the read-only transactions are implemented as non-snapshot standalone reads. The following table shows the operation types Spanner supports and their respective concurrency control.
Operation Types and Their Concurrency Control
Operation | Concurrency Control |
Read-Write Transaction | Pessimistic |
Read-Only Transaction | Lock-free |
Snapshot Read, client-provided timestamp | Lock-free |
Snapshot Read, client-provided bound | Lock-free |
Read-only transactions
The advantages of snapshot isolation are utilized by the read-only transaction. A read-only transaction requires pre-declaration of being read-only and not having any writes, since it is not a read-write transaction without any writes. Spanner executes reads within a read-only transaction without locking at a timestamp
A snapshot read is where we read from the past without acquiring a lock. When performing a snapshot read, a client can either provide a timestamp themselves, or the system can determine the timestamp on the basis of an upper bound on the timestamp’s expiration provided by the client. A snapshot read can be executed on any up-to-date replica in either scenario.
Once a timestamp has been chosen, the commit is unavoidable for both snapshot reads and read-only transactions. The commit does not happen if the data has been garbage collected at that timestamp. Therefore, clients can escape a retry loop's buffering of results by receiving them immediately. Clients can continue their queries on other servers if a server goes down by sending the same timestamp and reading location.
Paxos leader leases
Timed leases are used in our Paxos implementation for sustainable leadership, that is, by default, 10 seconds. A node sends the request for timed lease votes to other participants to become a leader. When that node has a quorum of lease votes from other participants, it means it has the lease to become a leader.
For leader election, Paxos uses the Bully algorithm. If the proposer is not proposing to be a leader, any node can aim to be a leader. For a particular node to be accepted by the cluster, the node sends its server IDs to all peers within the cluster. The peers respond with their server ID. If all responses are smaller than the node's server ID, that node becomes the leader.
If a peer receives the ID of a node that wants to be a leader, and its ID is greater than the received ID, then instead of returning its ID, it starts an election to become the leader. The bully algorithm avoids livelock issues.
After a successful write, a replica's lease vote is automatically extended, and the leader will ask for a lease vote extension if it is about to expire. The lease interval of the leader starts when it receives the necessary number of votes to constitute a quorum, and the lease interval will end when the leader no longer has a quorum of votes. Within a Paxos group, a Paxos leader can change over time for different reasons. Spanner enforces a disjointness invariant, which implies that lease intervals of leaders are disjointed across all leaders of the shard over time.
Spanner implementation allows an abdication of a Paxos leader by freeing the lease votes of the participants. Spanner limits the conditions under which abdication is allowed to ensure that the disjointness invariant is maintained. The conditions are as follows:
Consider that the Paxos leader defines the maximum timestamp as
. Before abdication, the Paxos leader should wait for
.
The abdication of the Paxos leader is explained in the following slides:
To work, for each Paxos group, we require a disjointness invariant to be true, which means that the lease intervals of each Paxos group leader should be disjointed from other Paxos group leaders.
Read and write transactions
The read and write transactions use two-phase locking. The timestamp can be assigned to a transaction after acquiring all locks and before releasing them. Spanner assigns a timestamp for any given transaction. Spanner enforces the following invariants:
The disjointness invariant
The external consistency invariant
Spanner assigns timestamps to Paxos writes in increasing order in each Paxos group. The assignment is in monotonically ordered across leaders too. The assignment of timestamps in monotonically ascending order is a simple task for a single leader replica. Using the disjointness invariant, all leaders must comply with this rule: a leader can assign timestamps only during their leader lease. It means when a timestamp
In the external consistency invariant, if the
Start of
...