Home/Blog/Programming/The Zoo of Consistency Models
Home/Blog/Programming/The Zoo of Consistency Models

The Zoo of Consistency Models

Abdul Qadeer
Sep 03, 2023
12 min read

Data has always been important—since the inception of relational databases to today’s distributed databases. System designers provide many consistency models and isolation levels for data consumers. These models set the expectations for data consumers and help them prove the correctness properties of their data processing systems. A tradeoff exists between model implementation in a real system and end-programmer ease of using the model.

Our purpose in this blog is to classify major consistency models in a coherent way and to explain the models in easy-to-understand discussion.

Prerequisite concepts in consistency models#

We will use two criteria to classify consistency models—transactions and granularity of data used in some processing. Let’s discuss them one by one.

Transactional and non-transactional consistency models#

Many of the consistency models were developed in the context of database transactionsA database transaction is an abstraction provided by the database designer to the application developer to enable grouping of many statements as one indivisible group. This either takes effect (commits) as a whole or nothing happens (abort). Transactions are the bread and butter of modern data processing because they make data processing and reasoning about it particularly easy.. Later, additional models were introduced in the distributed computing domain. When a database becomes distributed, consistency models from both worlds can interact. Models can be classified as transactional and non-transactional.

Single-object and multiple-object-based consistency models#

Some operations only involve one object at a time, for example, changing one row of a database or updating a single key in a key-value store. However, many use cases involve multiple objects, such as rows from different database tables or many keys at once. We need transactions with multiple objects for such purposes.

Multi-object transactions are more involved as compared to single-object operations. Multi-object transactions in distributed settings are even more challenging because, for example, different shardsWhen we partition data, each partition is called a shard. Shards are often distributed among many nodes for performance and availability reasons. of data might be physically far away in different data centers.

What constitutes a single object is subjective. For some applications, a single object might be a single key, or one row of a table of a database or a document in some document database. Usually, dealing with multiple keys or rows in different tables or different documents is considered multi-object access.

Note: Consistency models that are easier to implement for system designers are usually hard to use by the application developers. Often, ease for developers comes at a cost, such as reduced performance.

Consistency model as a hierarchy#

Consistency models are usually related to each other in subtle ways. The following graph depicts a hierarchy of consistency models where models at the bottom provide weak consistency guarantees. However, as we move up the graph, the consistency guarantees become stronger.

The following graph is adapted from Jepsen, which was adapted from Bailis, Davidson, Fekete, et al., and Viotti and Vukilic.

A hierarchy of consistency models
A hierarchy of consistency models

The above illustration shows the division of consistency models as per our previous discussion.

  • The models on the right are primarily used in distributed systems and are single-object models. The models on the left are primarily transactional models, and their origin is from the database systems. The model on the top (strict serializability) is a convergence point for both transactional and non-transactional.

  • The illustration shows the availability status under each consistency model when the network partitions. Some models allow offline operations, some allow sticky availability (where a client needs to stick to a specific replica to achieve a specific consistency model), and the remaining provide no availability if network partitions.

In this blog, we will start with the linearizability consistency model that sits on top of the non-transactional, single-object models.

Understanding linearizability#

Linearizability, being toward the top of the consistency models hierarchy, is one of the strongest single-object consistency models. Informally, operation semantics under linearizability provide a total order of operations such that the results are consistent with real-world clock times.

Note: Linearizability, is also known as external consistency, strong consistency, immediate consistency, or atomic consistency.

Let’s first see what can go wrong if a system is not linearizable.

An example of a non-linearizable read: a hypothetical game server causing confusion for viewers of Super Bowl LVII
An example of a non-linearizable read: a hypothetical game server causing confusion for viewers of Super Bowl LVII

The above illustration is a hypothetical situation that can occur if a system is not linearizable. Two friends (Alice and Bob) use an online site to check the latest score of a football game. Because the customer load on the service was high, both of their calls hit different replicas. Alice sees the latest score, while Bob gets a stale score. Alice happily tells Bob that her team has won, but Bob can’t believe it and tells Alice that the game is still on.

The primary issue is that Bob issued a read command after Alice and got back a stale result. While in our example, there was only confusion among friends momentarily because, hopefully, after a while, when Bob retries, he will get the latest result too. But such non-linearizability can have more damaging effects, such as data loss.

Returning stale results is not allowed under linearizability. Once any reader has read a new value, no other client should be able to read the stale value.

Read forward in time#

One way to see if a system is linearizable is to see when (in terms of real-world clock) writes and reads happen. These should be forward in terms of real time (not backward). That alludes that the entities in the system should have some notion of time (for example, the wall-clock time with bounded delays, such as in the Spanner database).

Note: The Spanner database is a complex topic that Educative has presented in an easy-to-understand manner in the Spanner database chapter of the Grokking the Principles and Practices of Advanced System Design course.

Let’s see an example where four clients are reading and writing to a database. The bar width of the writes and reads indicate the amount of time the database can take to complete those actions and to reply back to the clients. These times are unequal because the load on the database or the server hosting the database changes over time. We indicate when a read or write takes effect using a marker on the database timeline. We number those times when the database has applied the operation in ascending time order.

The following example has one issue stopping it from being a linearizable system. The last read by Client B reads a stale value (2), while another client (Client A) has read a newer value (4) before. It is a similar situation that we had in our hypothetical football game scenario.

Vertical bars along the database tell the point in time when a write or read actually happens on the database
Vertical bars along the database tell the point in time when a write or read actually happens on the database

The concept of concurrent requests#

There can be concurrent requests that overlap in time in the system. There can be more than one way to put them in some total order. These requests give us some leeway on how to order requests, as shown in the illustration below.

In the following example, requests labeled as G, E, and H are concurrent with request F. Either of E, G, and H can return the value of 0 (the old one) or 1 (the new one). That leeway is there due to concurrent requests. Though once someone sees a new value, no one should see the old value after that point. Note that once Client C is notified that its write of x=1x=1 is successful, Client A (marked as I) must now get the new value because another client (Client C) now knows it.

Concurrent reads with the writes can return old or new value
Concurrent reads with the writes can return old or new value

Challenges in the implementation of linearizable systems#

We might think that if all the clients and the database are on the same server, sharing the same clock, providing linearizability will be easy. However, that is not the case. Modern multi-core systems employ NUMA“Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory (memory local to another processor or memory shared between processors).” Source: Wikipedia. memory architecture; therefore, a local write to a core might not be visible to other cores right away.

Modern processor architecture. It is a little distributed system in itself
Modern processor architecture. It is a little distributed system in itself

The next example shows how consistency models come into play in somewhat nonintuitive ways. In the following example, a client uploads an image and then asks the service to resize it. Now first, the file is uploaded to the storage service, and then the resize request is put in the message queue. If the replicated file storage is not linearizable, it is possible that the image resizer dequeues a request to resize but does not find it on the storage service.

This example highlights that, as humans, the chronological order of events is embedded in our thinking, but if the underlying system is not linearizable, the outcomes can conflict with our expectations.

Two operations (marked as 2 and 3) were done in that order but non-linearizable system didn't respect that order
Two operations (marked as 2 and 3) were done in that order but non-linearizable system didn't respect that order

We might assume that the system will be linearizable if we use appropriate quorums with the quorum property (r + w > n, meaning that there should be at least one node common between read and write quorums). But that is not the case, as depicted in the following illustration. Reader A fulfills the quorum property (read from two replicas and 2 + 3 > 3) but get one old and one new value. Reader B gets a stale value after A has gotten a new value.

Note: Educative has a course for practitioners to learn about distributed systems concepts such as quorums and much more.

Using strict quorum does not guarantee linearizable execution
Using strict quorum does not guarantee linearizable execution

Note: There are many subtle ways different implementations of linearizable systems fail. Tools like Jepsen provide an automated way to detect if a database’s claimed consistency model holds or not.

Implementation of linearizability via consensus algorithms#

Using consensus algorithms, such as Paxos and Raft,We are in the process of adding a new section on consensus that includes Paxos and Raft in our course: Grokking the Principles and Practices of Advanced System Design". https://www.educative.io/courses/grokking-the-principles-and-practices-of-advanced-system-design is one of the ways to achieve linearizable systems. Such systems use replicated logs to implement state machines (encoding the state). These state machines execute commands in order from left to right from the log. That means no client can see the stale value.

We model each replica of the data as a state machine. Each state machine starts with a common initial state. As clients’ write and read requests come in to mutate or read the state, all replicas need a consensus on which request should be executed next so that all the state machines transition from one consistent state to the next. The consensus algorithms, such as Paxos and Raft, are used by these state machines to have a consensus on the next request to execute.

Due to the consensus algorithms, the sequence of execution of the commands is exactly the same on all replicas, which is why we get the same total order on each replica. Therefore, we get a linearizable system. However, we should understand that linearizability comes at a cost. For each client request (write or read), the consensus machinery first needs to do possibly multiple rounds of communication with all the replicas to reserve a slot for the client’s request. Once that is done, only then will the client’s request be executed. So the first cost is in terms of excessive network bandwidth use, and the second cost is client visible latency—the time a client must wait until consensus happens and the request executes to get a result. When many concurrent clients want their requests fulfilled, the costs mentioned above can exacerbate.

Note: The Spanner system provides linearizable operations and manages associated challenges in a large distributed database system. Spanner uses Paxos and two-phase commit (2PC) for consensus.

Our course on Grokking the Principles and Practices of Advanced System Design explains the Spanner system and consensus algorithms in detail. The consensus algorithms are subtle algorithms that are hard to understand. Our explanations go at length to highlight those subtle but important aspects of the algorithms.

The following slide deck explains a round of consensus among three replicas.

There is a client and three replica servers. Each replica server has a local log replica where client’s commands are logged after consensus
There is a client and three replica servers. Each replica server has a local log replica where client’s commands are logged after consensus
1 of 9

The C in the CAP theorem is linearizability#

The CAP theorem highlights a tradeoff between data consistency (C), availability of the system (A), and tolerance of the system to the network partitions (P). If the network partitions at some instance in time, a large distributed system can either provide strong data consistency or total availability (but not both at the same time).

The CAP theorem uses the word consistency, which means linearizability. So now we know that we must choose between linearizability and availability if a network partition happens (we can’t have both).

Note: If we don’t need real-time aspects, we can use sequential consistency. If we need support for multiple objects, we should use strict serializability. As we can see, the names of these consistency models resemble each other; however, they are fairly different. This is an unfortunate instance of terminology that we are stuck with.

Conclusion#

Many consistency models originate from database systems and distributed systems. Many of these models are subtly different from each other, and this is one of the reasons for bugs in the implementations of these models. This blog is the first in a series of blogs to discuss frequently used models in real systems with examples. We started with linearizability, which is a strong consistency guarantee, and we discussed how hard it is to implement it right.

Note: The topic of distributed systems has many subtle concepts, such as consistency models. For an in-depth study of these concepts and to see them in action, we recommend Educative’s following courses:


  

Free Resources