Concurrency Control in Spanner

Learn how concurrency control is done in Spanner.

In this lesson, we’ll learn how TrueTime guarantees correctness properties around concurrency control In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, the concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible. [source: Wikipedia]and how those properties ensure the following:

1. Externally consistentExternal consistency is a property of transaction-processing systems, where clients dynamically synthesize transactions that contain multiple read and write operations on arbitrary objects. [source: Google Cloud] transactions

2. Reads from the past that do not block any ongoing transaction

3. Read-only transactions that do not require any lock

These features make it possible to ensure that at any time, t, the full database audit read will see the exact results of all transactions that are committed as of t.

We will use Paxos write and Spanner clients write to differentiate between both writes.

Timestamp management

The implementation of Spanner supports reads from a snapshot and read-only and read-write transactions. The system implements the read-write transactions as standalone writes, whereas the read-only transactions are implemented as non-snapshot standalone reads. The following table shows the operation types Spanner supports and their respective concurrency control.

Operation Types and Their Concurrency Control

Operation

Concurrency Control

Read-Write Transaction

Pessimistic

Read-Only Transaction

Lock-free

Snapshot Read, client-provided timestamp

Lock-free

Snapshot Read, client-provided bound

Lock-free

Read-only transactions

The advantages of snapshot isolation are utilized by the read-only transaction. A read-only transaction requires pre-declaration of being read-only and not having any writes, since it is not a read-write transaction without any writes. Spanner executes reads within a read-only transaction without locking at a timestamp tt which is determined by the system. Executing the read at tt avoids blocking the incoming writes. This is an example of any suitably up-to-date replica that may handle the reads in a read-only transaction.

A snapshot read is where we read from the past without acquiring a lock. When performing a snapshot read, a client can either provide a timestamp themselves, or the system can determine the timestamp on the basis of an upper bound on the timestamp’s expiration provided by the client. A snapshot read can be executed on any up-to-date replica in either scenario.

Snapshot reads do not require locks
Snapshot reads do not require locks

Once a timestamp has been chosen, the commit is unavoidable for both snapshot reads and read-only transactions. The commit does not happen if the data has been garbage collected at that timestamp. Therefore, clients can escape a retry loop's buffering of results by receiving them immediately. Clients can continue their queries on other servers if a server goes down by sending the same timestamp and reading location.

Paxos leader leases

Timed leases are used in our Paxos implementation for sustainable leadership, that is, by default, 10 seconds. A node sends the request for timed lease votes to other participants to become a leader. When that node has a quorum of lease votes from other participants, it means it has the lease to become a leader.

For leader election, Paxos uses the Bully algorithm. If the proposer is not proposing to be a leader, any node can aim to be a leader. For a particular node to be accepted by the cluster, the node sends its server IDs to all peers within the cluster. The peers respond with their server ID. If all responses are smaller than the node's server ID, that node becomes the leader.

If a peer receives the ID of a node that wants to be a leader, and its ID is greater than the received ID, then instead of returning its ID, it starts an election to become the leader. The bully algorithm avoids livelock issues.

After a successful write, a replica's lease vote is automatically extended, and the leader will ask for a lease vote extension if it is about to expire. The lease interval of the leader starts when it receives the necessary number of votes to constitute a quorum, and the lease interval will end when the leader no longer has a quorum of votes. Within a Paxos group, a Paxos leader can change over time for different reasons. Spanner enforces a disjointness invariant, which implies that lease intervals of leaders are disjointed across all leaders of the shard over time.

Spanner implementation allows an abdication of a Paxos leader by freeing the lease votes of the participants. Spanner limits the conditions under which abdication is allowed to ensure that the disjointness invariant is maintained. The conditions are as follows:

  • Consider that the Paxos leader defines the maximum timestamp as SmaxS_{max}.

  • Before abdication, the Paxos leader should wait for TT.after(smax)TT.after(s_{max}).

The abdication of the Paxos leader is explained in the following slides:

To work, for each Paxos group, we require a disjointness invariant to be true, which means that the lease intervals of each Paxos group leader should be disjointed from other Paxos group leaders.

Read and write transactions

The read and write transactions use two-phase locking. The timestamp can be assigned to a transaction after acquiring all locks and before releasing them. Spanner assigns a timestamp for any given transaction. Spanner enforces the following invariants:

  1. The disjointness invariant

  2. The external consistency invariant

Spanner assigns timestamps to Paxos writes in increasing order in each Paxos group. The assignment is in monotonically ordered across leaders too. The assignment of timestamps in monotonically ascending order is a simple task for a single leader replica. Using the disjointness invariant, all leaders must comply with this rule: a leader can assign timestamps only during their leader lease. It means when a timestamp ss is assigned to Paxos leader, smaxs_{max} is incremented to maintain disjointness.

In the external consistency invariant, if the T2T2 transaction is started after T1T1's commit, we can infer from it that T2T2's commit timestamp should be greater than T1T1's commit timestamp. Consider the following for the transaction TiT_{i}:

  • Start of TiT_{i} ...

Access this course and 1400+ top-rated courses and projects.