Data has always been important—since the inception of relational databases to today’s distributed databases. System designers provide many consistency models and isolation levels for data consumers. These models set the expectations for data consumers and help them prove the correctness properties of their data processing systems. A tradeoff exists between model implementation in a real system and end-programmer ease of using the model.
Our purpose in this blog is to classify major consistency models in a coherent way and to explain the models in easy-to-understand discussion.
We will use two criteria to classify consistency models—transactions and granularity of data used in some processing. Let’s discuss them one by one.
Many of the consistency models were developed in the context of database
Some operations only involve one object at a time, for example, changing one row of a database or updating a single key in a key-value store. However, many use cases involve multiple objects, such as rows from different database tables or many keys at once. We need transactions with multiple objects for such purposes.
Multi-object transactions are more involved as compared to single-object operations. Multi-object transactions in distributed settings are even more challenging because, for example, different
What constitutes a single object is subjective. For some applications, a single object might be a single key, or one row of a table of a database or a document in some document database. Usually, dealing with multiple keys or rows in different tables or different documents is considered multi-object access.
Note: Consistency models that are easier to implement for system designers are usually hard to use by the application developers. Often, ease for developers comes at a cost, such as reduced performance.
Consistency models are usually related to each other in subtle ways. The following graph depicts a hierarchy of consistency models where models at the bottom provide weak consistency guarantees. However, as we move up the graph, the consistency guarantees become stronger.
The following graph is adapted from Jepsen, which was adapted from Bailis, Davidson, Fekete, et al., and Viotti and Vukilic.
The above illustration shows the division of consistency models as per our previous discussion.
The models on the right are primarily used in distributed systems and are single-object models. The models on the left are primarily transactional models, and their origin is from the database systems. The model on the top (strict serializability) is a convergence point for both transactional and non-transactional.
The illustration shows the availability status under each consistency model when the network partitions. Some models allow offline operations, some allow sticky availability (where a client needs to stick to a specific replica to achieve a specific consistency model), and the remaining provide no availability if network partitions.
In this blog, we will start with the linearizability consistency model that sits on top of the non-transactional, single-object models.
Linearizability, being toward the top of the consistency models hierarchy, is one of the strongest single-object consistency models. Informally, operation semantics under linearizability provide a total order of operations such that the results are consistent with real-world clock times.
Note: Linearizability, is also known as external consistency, strong consistency, immediate consistency, or atomic consistency.
Let’s first see what can go wrong if a system is not linearizable.
The above illustration is a hypothetical situation that can occur if a system is not linearizable. Two friends (Alice and Bob) use an online site to check the latest score of a football game. Because the customer load on the service was high, both of their calls hit different replicas. Alice sees the latest score, while Bob gets a stale score. Alice happily tells Bob that her team has won, but Bob can’t believe it and tells Alice that the game is still on.
The primary issue is that Bob issued a read command after Alice and got back a stale result. While in our example, there was only confusion among friends momentarily because, hopefully, after a while, when Bob retries, he will get the latest result too. But such non-linearizability can have more damaging effects, such as data loss.
Returning stale results is not allowed under linearizability. Once any reader has read a new value, no other client should be able to read the stale value.
One way to see if a system is linearizable is to see when (in terms of real-world clock) writes and reads happen. These should be forward in terms of real time (not backward). That alludes that the entities in the system should have some notion of time (for example, the wall-clock time with bounded delays, such as in the Spanner database).
Note: The Spanner database is a complex topic that Educative has presented in an easy-to-understand manner in the Spanner database chapter of the Grokking the Principles and Practices of Advanced System Design course.
Let’s see an example where four clients are reading and writing to a database. The bar width of the writes and reads indicate the amount of time the database can take to complete those actions and to reply back to the clients. These times are unequal because the load on the database or the server hosting the database changes over time. We indicate when a read or write takes effect using a marker on the database timeline. We number those times when the database has applied the operation in ascending time order.
The following example has one issue stopping it from being a linearizable system. The last read by Client B reads a stale value (2), while another client (Client A) has read a newer value (4) before. It is a similar situation that we had in our hypothetical football game scenario.
There can be concurrent requests that overlap in time in the system. There can be more than one way to put them in some total order. These requests give us some leeway on how to order requests, as shown in the illustration below.
In the following example, requests labeled as G, E, and H are concurrent with request F. Either of E, G, and H can return the value of 0 (the old one) or 1 (the new one). That leeway is there due to concurrent requests. Though once someone sees a new value, no one should see the old value after that point. Note that once Client C is notified that its write of
We might think that if all the clients and the database are on the same server, sharing the same clock, providing linearizability will be easy. However, that is not the case. Modern multi-core systems employ
The next example shows how consistency models come into play in somewhat nonintuitive ways. In the following example, a client uploads an image and then asks the service to resize it. Now first, the file is uploaded to the storage service, and then the resize request is put in the message queue. If the replicated file storage is not linearizable, it is possible that the image resizer dequeues a request to resize but does not find it on the storage service.
This example highlights that, as humans, the chronological order of events is embedded in our thinking, but if the underlying system is not linearizable, the outcomes can conflict with our expectations.
We might assume that the system will be linearizable if we use appropriate quorums with the quorum property (r + w > n, meaning that there should be at least one node common between read and write quorums). But that is not the case, as depicted in the following illustration. Reader A fulfills the quorum property (read from two replicas and 2 + 3 > 3) but get one old and one new value. Reader B gets a stale value after A has gotten a new value.
Note: Educative has a course for practitioners to learn about distributed systems concepts such as quorums and much more.
Note: There are many subtle ways different implementations of linearizable systems fail. Tools like Jepsen provide an automated way to detect if a database’s claimed consistency model holds or not.
Using consensus algorithms, such as Paxos and Raft,
We model each replica of the data as a state machine. Each state machine starts with a common initial state. As clients’ write and read requests come in to mutate or read the state, all replicas need a consensus on which request should be executed next so that all the state machines transition from one consistent state to the next. The consensus algorithms, such as Paxos and Raft, are used by these state machines to have a consensus on the next request to execute.
Due to the consensus algorithms, the sequence of execution of the commands is exactly the same on all replicas, which is why we get the same total order on each replica. Therefore, we get a linearizable system. However, we should understand that linearizability comes at a cost. For each client request (write or read), the consensus machinery first needs to do possibly multiple rounds of communication with all the replicas to reserve a slot for the client’s request. Once that is done, only then will the client’s request be executed. So the first cost is in terms of excessive network bandwidth use, and the second cost is client visible latency—the time a client must wait until consensus happens and the request executes to get a result. When many concurrent clients want their requests fulfilled, the costs mentioned above can exacerbate.
Note: The Spanner system provides linearizable operations and manages associated challenges in a large distributed database system. Spanner uses Paxos and two-phase commit (2PC) for consensus.
Our course on Grokking the Principles and Practices of Advanced System Design explains the Spanner system and consensus algorithms in detail. The consensus algorithms are subtle algorithms that are hard to understand. Our explanations go at length to highlight those subtle but important aspects of the algorithms.
The following slide deck explains a round of consensus among three replicas.
The CAP theorem highlights a tradeoff between data consistency (C), availability of the system (A), and tolerance of the system to the network partitions (P). If the network partitions at some instance in time, a large distributed system can either provide strong data consistency or total availability (but not both at the same time).
The CAP theorem uses the word consistency, which means linearizability. So now we know that we must choose between linearizability and availability if a network partition happens (we can’t have both).
Note: If we don’t need real-time aspects, we can use sequential consistency. If we need support for multiple objects, we should use strict serializability. As we can see, the names of these consistency models resemble each other; however, they are fairly different. This is an unfortunate instance of terminology that we are stuck with.
Many consistency models originate from database systems and distributed systems. Many of these models are subtly different from each other, and this is one of the reasons for bugs in the implementations of these models. This blog is the first in a series of blogs to discuss frequently used models in real systems with examples. We started with linearizability, which is a strong consistency guarantee, and we discussed how hard it is to implement it right.
Note: The topic of distributed systems has many subtle concepts, such as consistency models. For an in-depth study of these concepts and to see them in action, we recommend Educative’s following courses:
Free Resources