Home/Blog/System Design/Understanding the Causal Consistency Model

Understanding the Causal Consistency Model

15 min read

Oct 16, 2023

content

Causality and Happened-Before Relation

How to Implement Causality

Lamport Clocks

Synchronizer Example From Lamport Paper

Algorithm Sketch

A Replicated Key-Value Store

Shortcomings of Logical Clocks

Recipe Sharing With Logical Clocks

Conclusion

Most of our daily life events are based on cause and effect—when we order a meal on our app, it gets delivered after some time. It can confuse people if such causality is not maintained in the digital world, such as a group chat application (WhatsApp or iMessage). More serious repercussions can happen, such as data loss, if, for example, a data store cannot find cause-and-effect relations in the read/write messages coming to it. However, as we will see later in this blog while cause-and-effect comes naturally to humans, enforcing causality in digital systems is not trivial.

Causal consistencyLloyd, Wyatt, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. “A Short Primer on Causal Consistency.” ;login: 38, no. 4, August 2013. is a distributed system model that ensures that causally related events appear in a consistent order across distributed nodes while permitting flexibility in the ordering of unrelated events. It balances data consistency and system availability in distributed computing.

Let's consider an example:

Three friends—Alice, Sally, and Bob—are chatting with each other using an app similar to WhatsApp. Alice asks the other two about a recipe (messages tagged as $A$ in the following illustration).
Sally receives this message and sends a subsequent message to Bob (message $B$ ).
Because Bob hasn’t yet received message $A$ (that caused message $B$ ), this message from Sally caused confusion for Bob.

Later on, message $C$ caused message $D$ , but message $D$ reached earlier than $C$ to Alice (point marked as $E$ ), potentially ruining the recipe order.

It is clear from this simple example how a non-causal system can be a problem for its users.

Note: Chat apps like WhatsApp and iMessage work in a client-server model where chat participants’ messages are sent to a common server, and then that server disseminates the messages to everyone. In the following illustration, we have omitted such a server for simplicity. We can assume, for example, that Alice sent message $A$ to a server, and that server then sent two separate messages to Sally and Bob. Since it is a group chat, each message needs to go to everyone. However, to reduce clutter, we don’t show some such message. For example, Sally’s first message also goes to Alice but we didn’t show it.

Data consistency models are like a contract between the end programmers and the system. Such a contract sets the expectation that how data writes and reads by multiple entities can interleave to provide a legal ordering of events. Many real-world applications rely on the guarantees provided by a consistency model for their correctness. For example, a bank’s storage system that provides strict serializability makes it possible for multiple (possibly concurrent) transactions to leave the account ledger in a consistent state.

We are surveying commonly used consistency models in databases and distributed systems. In our earlier blogs, we discussed the following hierarchy of consistency models. Linearizability and sequential consistency. This blog is about causal consistency. Let’s remind ourselves of some important aspects of this hierarchy.

The causal consistency is a weaker consistency model as compared to sequential consistency and linearizability.

Both sequential consistency and linearizability also have causal consistency.

Causality and Happened-Before Relation#

In a real system, inferring causality is hard. Though, finding potential causality or deciding if an event happened before another is possible. In that aspect, the term causal consistency is a misnomer. Because this term is commonly used, we will use it, but we will mean potential causality whenever we say causality.

How to Implement Causality#

Logical clocks are a de facto mechanism to capture potential causality. Leslie Lamport invented logical clocks in his seminal workLamport, Leslie. 1978. “Time, Clocks, and the Ordering of Events in a Distributed System.” Communications of the ACM 21 (7): 558–65. https://doi.org/10.1145/359545.359563.. We will look at Lamport clocks and a further improvement on them, called vector clocksBrown, Russell. “Vector Clocks Revisited.” Riak (blog), October 6, 2015. https://riak.com/posts/technical/vector-clocks-revisited/index.html?p=9545.html. Also see: Brown, Russell. “Vector Clocks Revisited Part 2: Dotted Version Vectors.” Riak (blog), November 10, 2015. https://riak.com/posts/technical/vector-clocks-revisited-part-2-dotted-version-vectors/index.html?p=9929.html..

Lamport Clocks#

If we need to tag each event on a system of processes with a timestamp that is coherent with potential causality, then day-of-time or monotonic clocks (commonly available in a computing device) are not appropriate due to imperfections such as clock skewFor this reason, would the service decide to do P’s write first, then Q’s? and the inherent difficulty to synchronize clocks across different nodes connected via a network. Lamport logical clocks are a way out of this situation.

Leslie Lamport invented Lamport clocks, which can be implemented on each participating process/node using a simple integer that will be tagged with each event (local event or network sends and receives).

A logical clock is associated with each process and it starts with an initial value (say, 0). As any event happens locally on a process, the clock value is incremented by 1 and assigned to the event. So the first rule is that:

The clock value of event $i$ should be less than the clock value of event $j$ if event $i$ happens before event $j$ on the same process.

The processes can only communicate with each other via network messages. The reception of a message happens after the sending of the message. Therefore, when a process sends a network message, it assigns a local clock value to this sending event and sends it along with the message. On the reception of such a message, the receiving process gets the max of the local clock value and the clock value received in the incoming message. Then, the receiving process adds one to the max value and assigns that value to the received message event. So our second condition is that:

If process $P$ sends a network message to process $Q$ , then the clock value associated with the sending of the message should be less than the reception of the same message.

In the following illustration, we show three processes on three separate nodes that can only communicate via messages. The dots on the timeline represent the events happening in a process. The first part of the tag (such as $P1$ ) represents the process’s unique identifier $(P)$ followed by the local even number. The second part is the clock value assigned to the event. Two important observations are in order:

Whenever an event $A$ happens before an event $B$ , the clock value associated with $B$ is always greater than event $A$ .

Those events that are not associated with any potential causality (meaning they are neither on the same process nor is there any connection due to a network message), such events are called concurrent events. For such cases we can’t say if event $A$ happened before $B$ or $B$ happened before A. In the following example, R1:1 and P3:3 are concurrent events.

Note: The clock value of event 1 on process R (1) is less than the clock value associated with event 3 on process P (3) but R1 still didn’t happen before P3.

Synchronizer Example From Lamport Paper#

The above questions' discussion might convince us that the plain Lamport clocks are not very useful by themselves. However, we will now see an example where we construct a distributed synchronizer where the processes send messages to all the participants so that clock value inference can be made.

Algorithm Sketch#

Let’s assume there are $n$ processes in a distributed system that can only communicate via network messages. Each process maintains a state machine independently. We define a state machine as a relation $C \times S \implies S$ where $C$ is a set of permissible commands a process can execute (for example, $P_i$ requests a resource, $P_i$ releases a resource, etc.). $S$ is a set of permissible states a process can be in. Execution of a command transitions a state machine from one state to the next.

Each process also maintains a queue of requests. A command with logical timestamp $T$ only executes when a process has learned about all the commands issued by all the processes in the system with a timestamp less than or equal to $T$ . All requests issued by the processes in the system are totally ordered by prepending process identifiers with their local logical clock value. These restrictions ensure that each state machine executes the commands in the same order. While a process hasn’t received some messages with a small timestamp, the newer timestamped commands will wait in the queue.

Our algorithm needs active participation from all processes. That means failure of a single process or persistent network failure can halt the whole system. More sophisticated algorithms are needed to deal with failures, though, those algorithms are out of the scope of our current blog. We will refer interested readers to a tutorial on the state machine.Schneider, Fred. n.d. “The State Machine Approach: * a Tutorial.” Accessed October 13, 2023. https://www.cs.cornell.edu/fbs/publications/ibmFault.sm.pdf.

Let’s see a simple example involving just two processes.

A Replicated Key-Value Store#

Let’s assume a key-value store is replicated at two remote sites. Process $P_1$ and $P_2$ are at the two sites to entertain the get and put requests on the key-value store. In the following illustration, both processes concurrently try to change the value of the same key $k1$ . Though, as per our synchronization algorithm, the order of mutations is ultimately the same at both sites and the replicated key-value store is left in a consistent state with the final value of $k1$ as 5.

Note that in a system of $n$ processes that want to execute $x$ commands, the algorithm will need to send out $O(n \times x)$ messages to achieve synchronization before execution of the commands.

Note: Lamport clocks are further improved in vector clocks, where we are able to establish two-way implication on the happened-before relationship. Each process maintains a vector of clock values—one entry for each process—and appends that vector in each message. As the number of participants increases in the system, the length of the vector increases, and such data consumes network bandwidth. Further optimizations are possible in a variant of vector clocks called dotted vector clocks.

Shortcomings of Logical Clocks#

There are a few shortcomings in the logical clocks that programmers should be aware of.

If our system cannot record out-of-band events, it can not infer potential causality for them. For example, if we order food on an app, but then call the restaurant to make a change in our order, the system is unable to see that the action taken by a person at the restaurant happened after the actions taken during the original order placing.
Programmers need to devise a strategy to deal with the concurrent events.
Programmers also need to devise how to infer true causality from the happened-before relationship provided by logical clocks.

Let’s return to our original example of recipe sharing between friends and see how using logical clocks helps us out. Here we will use vector clocks. A vector like [0,0,0] means that the first index is Alice’s clock value, the second one is Sally’s, and the last one is Bob’s. Now each message will be tagged with such clock values. Additionally, each message should reach every participant (for example, dashed lines $C$ and $D$ were added for this purpose).

Note that at tag $B,$ Bob’s process will not display the message received from Sally because the received clock value ([1,1,0]) tells that Sally has seen a message that Bob hasn’t seen yet. When a message from Alice finally arrives at Bob’s node with the clock value [1,0,0], because Alice’s message’s clock is smaller than Sally’s message, Bob will see first Alice’s message and then Sally’s. Similar things happen at point $E$ , where out-of-order messages will be queued and will only be displayed when all subsequent messages have arrived.

Conclusion#

Our real-world interactions are strongly connected with the time of occurrence and how we see one thing happening after the other, possibly in cause-effect relationships. Logical clocks are a simple but powerful way to capture such potential causality in a networked system of processes. Many data stores use the causal consistency model to provide an intuitive order of operations on data.

The logical clocks and associated state machines assumed a non-faulty system in this blog. To learn how we deal with failing nodes and networks, see the “State Machine Replication” chapter in our course Grokking the Principles and Practices of Advanced System Design.