Home/Blog/System Design/Understanding the Causal Consistency Model
Home/Blog/System Design/Understanding the Causal Consistency Model

Understanding the Causal Consistency Model

Abdul Qadeer
Oct 16, 2023
15 min read

Most of our daily life events are based on cause and effect—when we order a meal on our app, it gets delivered after some time. It can confuse people if such causality is not maintained in the digital world, such as a group chat application (WhatsApp or iMessage). More serious repercussions can happen, such as data loss, if, for example, a data store cannot find cause-and-effect relations in the read/write messages coming to it. However, as we will see later in this blog while cause-and-effect comes naturally to humans, enforcing causality in digital systems is not trivial.

Causal consistencyLloyd, Wyatt, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. “A Short Primer on Causal Consistency.” ;login: 38, no. 4, August 2013. is a distributed system model that ensures that causally related events appear in a consistent order across distributed nodes while permitting flexibility in the ordering of unrelated events. It balances data consistency and system availability in distributed computing.

Let's consider an example:

  • Three friends—Alice, Sally, and Bob—are chatting with each other using an app similar to WhatsApp. Alice asks the other two about a recipe (messages tagged as AA in the following illustration).

  • Sally receives this message and sends a subsequent message to Bob (message BB).

  • Because Bob hasn’t yet received message AA (that caused message BB), this message from Sally caused confusion for Bob.

  • Later on, message CC caused message DD, but message DD reached earlier than CC to Alice (point marked as EE), potentially ruining the recipe order.

It is clear from this simple example how a non-causal system can be a problem for its users.

Note: Chat apps like WhatsApp and iMessage work in a client-server model where chat participants’ messages are sent to a common server, and then that server disseminates the messages to everyone. In the following illustration, we have omitted such a server for simplicity. We can assume, for example, that Alice sent message AA to a server, and that server then sent two separate messages to Sally and Bob. Since it is a group chat, each message needs to go to everyone. However, to reduce clutter, we don’t show some such message. For example, Sally’s first message also goes to Alice but we didn’t show it.

Group chat among friends
Group chat among friends

Data consistency models are like a contract between the end programmers and the system. Such a contract sets the expectation that how data writes and reads by multiple entities can interleave to provide a legal ordering of events. Many real-world applications rely on the guarantees provided by a consistency model for their correctness. For example, a bank’s storage system that provides strict serializability makes it possible for multiple (possibly concurrent) transactions to leave the account ledger in a consistent state.

Cover
Distributed Systems for Practitioners

This course is about establishing the basic principles of distributed systems. It explains the scope of their functionality by discussing what they can and cannot achieve. It also covers the basic algorithms and protocols of distributed systems through easy-to-follow examples and diagrams that illustrate the thinking behind some design decisions and expand on how they can be practiced. This course also discusses some of the issues that might arise when doing so, eliminates confusion around some terms (e.g., consistency), and fosters thinking about trade-offs when designing distributed systems. Moreover, it provides plenty of additional resources for those who want to invest more time in gaining a deeper understanding of the theoretical aspects of distributed systems.

9hrs 30mins
Beginner
18 Quizzes
617 Illustrations

We are surveying commonly used consistency models in databases and distributed systems. In our earlier blogs, we discussed the following hierarchy of consistency models. Linearizability and sequential consistency. This blog is about causal consistency. Let’s remind ourselves of some important aspects of this hierarchy.

The causal consistency is a weaker consistency model as compared to sequential consistency and linearizability.

Both sequential consistency and linearizability also have causal consistency.

The zoo of consistency models: As we walk up the hierarchy, consistency models provide stronger guarantees to programmers. However, they are usually expensive to build and are prone to unavailability periods. This illustration is adapted from Jepsen, which was adapted from Bailis, Davidson, Fekete, et al., and Viotti and Vukilic.
The zoo of consistency models: As we walk up the hierarchy, consistency models provide stronger guarantees to programmers. However, they are usually expensive to build and are prone to unavailability periods. This illustration is adapted from Jepsen, which was adapted from Bailis, Davidson, Fekete, et al., and Viotti and Vukilic.

Causality and Happened-Before Relation#

In a real system, inferring causality is hard. Though, finding potential causality or deciding if an event happened before another is possible. In that aspect, the term causal consistency is a misnomer. Because this term is commonly used, we will use it, but we will mean potential causality whenever we say causality.

How to Implement Causality#

Logical clocks are a de facto mechanism to capture potential causality. Leslie Lamport invented logical clocks in his seminal workLamport, Leslie. 1978. “Time, Clocks, and the Ordering of Events in a Distributed System.” Communications of the ACM 21 (7): 558–65. https://doi.org/10.1145/359545.359563.. We will look at Lamport clocks and a further improvement on them, called vector clocksBrown, Russell. “Vector Clocks Revisited.” Riak (blog), October 6, 2015. https://riak.com/posts/technical/vector-clocks-revisited/index.html?p=9545.html. Also see: Brown, Russell. “Vector Clocks Revisited Part 2: Dotted Version Vectors.” Riak (blog), November 10, 2015. https://riak.com/posts/technical/vector-clocks-revisited-part-2-dotted-version-vectors/index.html?p=9929.html..

Lamport Clocks#

If we need to tag each event on a system of processes with a timestamp that is coherent with potential causality, then day-of-time or monotonic clocks (commonly available in a computing device) are not appropriate due to imperfections such as clock skewFor this reason, would the service decide to do P’s write first, then Q’s? and the inherent difficulty to synchronize clocks across different nodes connected via a network. Lamport logical clocks are a way out of this situation.

Leslie Lamport invented Lamport clocks, which can be implemented on each participating process/node using a simple integer that will be tagged with each event (local event or network sends and receives).

A logical clock is associated with each process and it starts with an initial value (say, 0). As any event happens locally on a process, the clock value is incremented by 1 and assigned to the event. So the first rule is that:

The clock value of event ii should be less than the clock value of event jj if event ii happens before event jj on the same process.

The processes can only communicate with each other via network messages. The reception of a message happens after the sending of the message. Therefore, when a process sends a network message, it assigns a local clock value to this sending event and sends it along with the message. On the reception of such a message, the receiving process gets the max of the local clock value and the clock value received in the incoming message. Then, the receiving process adds one to the max value and assigns that value to the received message event. So our second condition is that:

If process PP sends a network message to process QQ, then the clock value associated with the sending of the message should be less than the reception of the same message.

In the following illustration, we show three processes on three separate nodes that can only communicate via messages. The dots on the timeline represent the events happening in a process. The first part of the tag (such as P1P1) represents the process’s unique identifier (P)(P) followed by the local even number. The second part is the clock value assigned to the event. Two important observations are in order:

Whenever an event AA happens before an event BB, the clock value associated with BB is always greater than event AA.

Those events that are not associated with any potential causality (meaning they are neither on the same process nor is there any connection due to a network message), such events are called concurrent events. For such cases we can’t say if event AA happened before BB or BB happened before A. In the following example, R1:1 and P3:3 are concurrent events.

Note: The clock value of event 1 on process R (1) is less than the clock value associated with event 3 on process P (3) but R1 still didn’t happen before P3.

Lamport clocks in action in a system of three processes
Lamport clocks in action in a system of three processes

Points to ponder

Question 1

If a storage service receives two write requests from two different processes, PP and QQ, with the clock value 1 and 10, respectively, then can the service infer that PP's event happened before QQ's event? For this reason, would the service decide to do PP’s write first, then QQ’s?

Show Answer

1 of 2

Synchronizer Example From Lamport Paper#

The above questions' discussion might convince us that the plain Lamport clocks are not very useful by themselves. However, we will now see an example where we construct a distributed synchronizer where the processes send messages to all the participants so that clock value inference can be made.

Algorithm Sketch#

Let’s assume there are nn processes in a distributed system that can only communicate via network messages. Each process maintains a state machine independently. We define a state machine as a relation C×S    SC \times S \implies S where CC is a set of permissible commands a process can execute (for example, PiP_i requests a resource, PiP_i releases a resource, etc.). SS is a set of permissible states a process can be in. Execution of a command transitions a state machine from one state to the next.

Each process also maintains a queue of requests. A command with logical timestamp TT only executes when a process has learned about all the commands issued by all the processes in the system with a timestamp less than or equal to TT. All requests issued by the processes in the system are totally ordered by prepending process identifiers with their local logical clock value. These restrictions ensure that each state machine executes the commands in the same order. While a process hasn’t received some messages with a small timestamp, the newer timestamped commands will wait in the queue.

Our algorithm needs active participation from all processes. That means failure of a single process or persistent network failure can halt the whole system. More sophisticated algorithms are needed to deal with failures, though, those algorithms are out of the scope of our current blog. We will refer interested readers to a tutorial on the state machine.Schneider, Fred. n.d. “The State Machine Approach: * a Tutorial.” Accessed October 13, 2023. https://www.cs.cornell.edu/fbs/publications/ibmFault.sm.pdf.

Let’s see a simple example involving just two processes.

A Replicated Key-Value Store#

Let’s assume a key-value store is replicated at two remote sites. Process P1P_1 and P2P_2 are at the two sites to entertain the get and put requests on the key-value store. In the following illustration, both processes concurrently try to change the value of the same key k1k1. Though, as per our synchronization algorithm, the order of mutations is ultimately the same at both sites and the replicated key-value store is left in a consistent state with the final value of k1k1 as 5.

A replicated key-value store: operations are synchronized using logical clocks and state machines
A replicated key-value store: operations are synchronized using logical clocks and state machines

Note that in a system of nn processes that want to execute xx commands, the algorithm will need to send out O(n×x)O(n \times x) messages to achieve synchronization before execution of the commands.

Note: Lamport clocks are further improved in vector clocks, where we are able to establish two-way implication on the happened-before relationship. Each process maintains a vector of clock values—one entry for each process—and appends that vector in each message. As the number of participants increases in the system, the length of the vector increases, and such data consumes network bandwidth. Further optimizations are possible in a variant of vector clocks called dotted vector clocks.

Shortcomings of Logical Clocks#

There are a few shortcomings in the logical clocks that programmers should be aware of.

  • If our system cannot record out-of-band events, it can not infer potential causality for them. For example, if we order food on an app, but then call the restaurant to make a change in our order, the system is unable to see that the action taken by a person at the restaurant happened after the actions taken during the original order placing.

  • Programmers need to devise a strategy to deal with the concurrent events.

  • Programmers also need to devise how to infer true causality from the happened-before relationship provided by logical clocks.

Recipe Sharing With Logical Clocks#

Let’s return to our original example of recipe sharing between friends and see how using logical clocks helps us out. Here we will use vector clocks. A vector like [0,0,0] means that the first index is Alice’s clock value, the second one is Sally’s, and the last one is Bob’s. Now each message will be tagged with such clock values. Additionally, each message should reach every participant (for example, dashed lines CC and DD were added for this purpose).

Note that at tag B,B, Bob’s process will not display the message received from Sally because the received clock value ([1,1,0]) tells that Sally has seen a message that Bob hasn’t seen yet. When a message from Alice finally arrives at Bob’s node with the clock value [1,0,0], because Alice’s message’s clock is smaller than Sally’s message, Bob will see first Alice’s message and then Sally’s. Similar things happen at point EE, where out-of-order messages will be queued and will only be displayed when all subsequent messages have arrived.

Using vector clocks to causally order group chat
Using vector clocks to causally order group chat

Conclusion#

Our real-world interactions are strongly connected with the time of occurrence and how we see one thing happening after the other, possibly in cause-effect relationships. Logical clocks are a simple but powerful way to capture such potential causality in a networked system of processes. Many data stores use the causal consistency model to provide an intuitive order of operations on data.

The logical clocks and associated state machines assumed a non-faulty system in this blog. To learn how we deal with failing nodes and networks, see the “State Machine Replication” chapter in our course Grokking the Principles and Practices of Advanced System Design.

#


  

Free Resources