Spanner Operations

Let's study the operations supported by the Spanner.

Spanner supports the following types of operations:

  • Read-write transactions
  • Read-only transactions
  • Standalone (strong or stale) reads

Read-write transaction

A read-write transaction can contain both read and/or write operations. It provides full ACID properties for the operations of the transaction. More specifically, read-write transactions are not simply serializable, but they are strictly serializable.

Note: Spanner documentation also refers to strict serializability with “external consistency”, but both are essentially the same guarantees.

A read-write transaction executes a set of reads and write operations atomically at a single logical point in time.

Note: As explained earlier, Spanner achieves these properties using two-phase locking for isolation and two-phase commit for atomicity across multiple splits.

Workflow

The workflow for the read-write transaction follows the following sequence:

  • After opening a transaction, a client directs all the read operations to the leader of the replica group that manages the split with the required rows. This leader acquires read locks for the rows and columns involved before serving the read request. Every read also returns the timestamp of any data read.

  • Any write operations are buffered locally in the client until the point the transaction is committed. While the transaction is open, the client sends keepalive messages to prevent participant leaders from timing out a transaction.

  • When a client has completed all reads and buffered all writes, it starts the two-phase commit protocolThe two-phase commit is required only if the transaction accesses data from multiple replica groups. Otherwise, the leader of the single replica group can commit the transaction only through Paxos.. It chooses one of the participant leaders as the coordinator leader and sends a prepare request to all the participant leaders along with the identity of the coordinator leader. The participant leaders involved in write operations also receive the buffered writes at this stage.

  • Every participant leader acquires the necessary write locks, chooses a prepare timestamp s​i that is larger than any timestamps of previous transactions, and logs a prepare record in its replica group through Paxos. The leader also replicates the lock acquisition to the replicas to ensure they will be held even in the case of a leader failure. It then responds to the coordinator leader with the prepare timestamp.

The following illustration contains a visualization of this sequence:

Spanner mitigating availability problems

It is worth noting that the availability problems from the two-phase commit are partially mitigated in this scheme because both the participants and the coordinator are essentially a Paxos group. So, if one of the leader nodes crashes, then another replica from that replica group will eventually detect that, take over and help the protocol make progress.

Spanner handling deadlocks

The two-phase locking protocol can result in deadlocks. Spanner resolves these situations via a wound-wait schemeD. J. Rosenkrantz, R. E. Stearns, and P. M. Lewis,II, “System Level Concurrency Control for Distributed Database Systems,” ACM Transactions on Database Systems (TODS), volume 3, Issue 2, June 1978, 1978., where a transaction TX1TX_1 is allowed to abort a transaction TX2TX_2 that holds the desired lock only if TX1TX_1 is older than TX2 ...