Database Operations in Spanner

Learn how read-write, read-only, and schema-change transactions work in detail.

In this lesson, we will learn about read-write, read-only, and schema-change transactions utilizing the timestamping mechanism.

Read-write transactions

A transaction's writes are buffered on the client side until the commit. Therefore, the results of a transaction's writes are not visible to subsequent reads inside the same transaction. This architecture is particularly well-suited to Spanner since uncommitted writes do not have timestamps assigned yet, and the timestamps of any data read are returned by a read transaction.

The following slides explain the read and write transactions.

Spanner uses the wound-wait approach to prevent deadlocks during reads within read-write transactions. Whenever a client requests up-to-date information, it sends the request to the group’s designated leader replica, acquiring the necessary read locks and retrieving the data. To avoid having its transaction timed out by the participant leaders, a client periodically sends keepalive messages while a transaction is still open. Then, the client finishes all reads and writes data to its write buffer.

Two-phase commit in Spanner

Spanner uses a two-phase commit (2PC) to guarantee isolation and strong consistency. The 2PC begins once a client has finished all the reads and has written data to its write buffer.

If participants in a 2PC are physically nearby, the latency for data propagation will be lower. Spanner ensures serializability by running 2PC and two-phase locking on the Paxos leaders. The client selects a 2PC coordinator that communicates with the other non-coordinator leaders of the Paxos group. The 2PC coordinator is the leader of that group too. The rest of the Paxos leaders are participants, and the client notifies the group leaders of the coordinator's identity. It also tells the participants the number of buffered writes via a commit message.

If the coordinator crashes, 2PC fails. To cater to it and ensure fault tolerance of the system, all states of the 2PC for both the coordinator and participant are stored in the Paxos state machine. If one of them were to go down in the middle of a 2PC round, the new leader would have all the necessary information to complete the commit.

Press + to interact
The leader in the partition 2 is the 2PC coordinator that communicates with the other non-coordinator leaders of the Paxos group
The leader in the partition 2 is the 2PC coordinator that communicates with the other non-coordinator leaders of the Paxos group

Non-coordinator role

A leader who isn't the coordinator gets access to write locks. To guarantee monotonicity, it chooses a prepare timestamp after any timestamps assigned to prior transactions, and the prepared record is logged via Paxos. After that, all participants communicate their prep time to the leader.

Coordinator role

The coordinator leader bypasses the prepare step and gets locks for the write. After receiving input from all the group's leaders, it selects a single timestamp for the entire transaction. Let's denote the commit transaction as ss and it should be as follows:

  1. Greater than or equal to all prepare timestamps to satisfy the invariants of read-write transactions

  2. Greater than TT.now().latestTT.now().latest (latest value is fetched when the client sends a commit message to the coordinator)

  3. Greater than the timestamps of all the transactions that the leader coordinator has assigned previously

All of the above help maintain invariants like monotonicity and constraints of read-write transactions.

Another constraint is commit wait. Therefore, the leader coordinator will wait till ...

Access this course and 1400+ top-rated courses and projects.