We are on a journey to survey and understand commonly used consistency models in databases and distributed systems. In this blog, we will discuss the sequential consistency model.
Let’s remind ourselves of the problems consistency models solve. In a replicated data store over geographically dispersed nodes, concurrent writes and reads to the data happen at different times due to inherent challenges in a distributed system (such as delays due to propagation and queuing). Clients’ requests can interleave. There are many possible interleavings of requests with substantial implications on data mutations and their visibility to others. Consistency models enforce some rules on the possible ordering/visibility of requests/data for the applications so that they can interact with the data in the correct way. What constitutes correct for an application will change from one consistency model to another. For example, if a replicated store claims to provide linearizable writes and reads to its client, applications assume that the model abides by a certain contract. If the replicated store is
Note: Sequential consistency is a non-transactional, single-object consistency model.
The sequential consistency model is a level below linearizability. In linearizability, we require a total ordering on all operations consistent with the real clock. In sequential consistency, we drop the demand for ordering consistency with the real-time clock. The important aspects of sequential consistency can be summarized as follows:
For the operations originating from the same
Across processes, any one of the possible total ordering suffices if it conforms to the local program orders. However, each process needs to see the same total order. The following illustration highlights an example. Process
Real time plays no role in sequential consistency.
Let’s see sequential consistency in action with many examples.
In the following illustration, there are three clients and two data nodes. Initially, the value of
You might think that client
Let’s see another example in the context of four concurrently executing processes. We can consider them data nodes acting on some clients’ requests. This interleaving is sequentially consistent and both
Now let’s see an example that is not sequentially consistent using the following illustration. Note that process
Points to ponder
If a replicated data store provides linearizability as a consistency model, does it also provide sequential consistency?
We saw in an earlier blog post that to implement linearizability, each data replica needs to coordinate both writes and reads among themselves. One way to do so was via consensus algorithms such as Paxos and Raft. Communicating every read and write to everyone requires sending many messages to each other before they converge on a value. We expect to have a lightweight implementation for sequential consistency because it is a relaxed model compared to linearizability.
In sequential consistency, processes/replicas don’t care to tell about their reads to the other processes. However, they need to disseminate their writes to everyone. One way to achieve this is to publish a write to a pub-sub system that will disseminate the writes to all the subscribers in the order they were entered at the queue. Note that writes from across processes can interleave in any order, but once they are in the queue, each process will get and process the writes in the queue order.
Note: You can review how a distributed queue and pub-sub system work.
If each data item has a single custodian such that each mutation happens only at that custodian/primary node, then that node will provide an order to the mutations and everyone will get the same order. This scheme will provide sequential consistency. The following slides show how writes and reads happen in such a primary-secondary system.
If we need to honor the real-time happening of the events, we should use linearizability. If not, then sequential consistency is a better option. When we are okay with any ordering for concurrent events, causal ordering might be a better choice (that is a topic of our next blog).
Note that under sequential consistency, the output of different processes will not match. For example, when an update value of a variable hasn’t reached some replica, it will print an old local value). For some use cases, such an outcome might not be desirable; therefore, linearizability might be a better choice there.
Sequential consistency is a single-object model. If we use it for more than one object, the cumulative result will not be sequentially consistent. To understand this point, consider the following example. Here, if we take
See that getting
Ordering of Operations | Result |
W1(x) a; W1(y) a; W2(y) b; W2(x) b | R1(x) b ; R2(y) b |
W1(x) a; W2(y) b; W1(y) a ; W2(x) b | R1(x) b ; R2(y) a |
W1(x) a; W2(y) b; W2(x) b; W1(y) a | R1(x) b ; R2(y) a |
W2(y) b; W1(x) a; W1(y) a; W2(x) b | R1(x) b ; R2(y) a |
W2(y) b; W1(x) a; W2(x) b; W1(y) a | R1(x) b ; R2(y) a |
W2(y) b; W2(x) b; W1(x) a; W1(y) a; | R1(x) a ; R2(y) a |
No ordering is possible, giving the required result | R1(x) a ; R2(y) b |
If network partitioning happens such that the pub-sub system or custodian replica becomes unreachable for a few clients/secondary replicas, they can’t complete their writes. This means that sequential consistency-compliant nodes must seize their operations under certain network failures until the network heals.
Many real-world applications
Single-threaded replication of MySQL provides sequential consistency, while multi-threaded replication can be configured to provide sequential consistency.
Note: Most modern systems can be configured to choose one of the few consistency models they have implemented.
Git is a distributed version control system with no notion of any centralized entity in it. In many use cases, a git node is designated a custodian to provide a common reference point for everyone. All other git instances push their changes to that custodian node and fetch all the new stuff from everyone else via a pull from the custodian node. Because the custodian node is a single point to order all writes, such a git set up provides sequential consistency.
HDFS allows only one write at a time. If there are concurrent writers to a file, a centralized name node will decide which writer will get to write the file. Therefore, such a scheme provides sequential consistency. HDFS borrows many traits from GFS.
GFS allows concurrent writes to a file, and each file is divided into chunks of 64 MB. Each chunk has a primary node that decides the order to the writes. One might assume that GFS provides a sequential consistency model, though, in reality, it provides a custom, much more nuanced consistency model.
Sequential consistency is a relaxed model compared to linearizability. Sequential consistency does not care about the real times of the events while ordering them. As a consequence, implementing sequential consistency is easier than linearizability. For many real applications, sequential consistency is an adequate model.
Note: The topic of distributed systems has many subtle concepts such as sequential consistency. For an in-depth study of these concepts, we recommend Educative's following courses:
Free Resources