Most of our daily life events are based on cause and effect—when we order a meal on our app, it gets delivered after some time. It can confuse people if such causality is not maintained in the digital world, such as a group chat application (WhatsApp or iMessage). More serious repercussions can happen, such as data loss, if, for example, a data store cannot find cause-and-effect relations in the read/write messages coming to it. However, as we will see later in this blog while cause-and-effect comes naturally to humans, enforcing causality in digital systems is not trivial.
is a distributed system model that ensures that causally related events appear in a consistent order across distributed nodes while permitting flexibility in the ordering of unrelated events. It balances data consistency and system availability in distributed computing. Causal consistency Lloyd, Wyatt, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. “A Short Primer on Causal Consistency.” ;login: 38, no. 4, August 2013.
Let's consider an example:
Three friends—Alice, Sally, and Bob—are chatting with each other using an app similar to WhatsApp. Alice asks the other two about a recipe (messages tagged as
Sally receives this message and sends a subsequent message to Bob (message
Because Bob hasn’t yet received message
Later on, message
It is clear from this simple example how a non-causal system can be a problem for its users.
Note: Chat apps like WhatsApp and iMessage work in a client-server model where chat participants’ messages are sent to a common server, and then that server disseminates the messages to everyone. In the following illustration, we have omitted such a server for simplicity. We can assume, for example, that Alice sent message
to a server, and that server then sent two separate messages to Sally and Bob. Since it is a group chat, each message needs to go to everyone. However, to reduce clutter, we don’t show some such message. For example, Sally’s first message also goes to Alice but we didn’t show it.
Data consistency models are like a contract between the end programmers and the system. Such a contract sets the expectation that how data writes and reads by multiple entities can interleave to provide a legal ordering of events. Many real-world applications rely on the guarantees provided by a consistency model for their correctness. For example, a bank’s storage system that provides strict serializability makes it possible for multiple (possibly concurrent) transactions to leave the account ledger in a consistent state.
This course is about establishing the basic principles of distributed systems. It explains the scope of their functionality by discussing what they can and cannot achieve. It also covers the basic algorithms and protocols of distributed systems through easy-to-follow examples and diagrams that illustrate the thinking behind some design decisions and expand on how they can be practiced. This course also discusses some of the issues that might arise when doing so, eliminates confusion around some terms (e.g., consistency), and fosters thinking about trade-offs when designing distributed systems. Moreover, it provides plenty of additional resources for those who want to invest more time in gaining a deeper understanding of the theoretical aspects of distributed systems.
We are surveying commonly used consistency models in databases and distributed systems. In our earlier blogs, we discussed the following hierarchy of consistency models. Linearizability and sequential consistency. This blog is about causal consistency. Let’s remind ourselves of some important aspects of this hierarchy.
The causal consistency is a weaker consistency model as compared to sequential consistency and linearizability.
Both sequential consistency and linearizability also have causal consistency.
In a real system, inferring causality is hard. Though, finding potential causality or deciding if an event happened before another is possible. In that aspect, the term causal consistency is a misnomer. Because this term is commonly used, we will use it, but we will mean potential causality whenever we say causality.
Logical clocks are a de facto mechanism to capture potential causality. Leslie Lamport invented logical clocks in his
If we need to tag each event on a system of processes with a timestamp that is coherent with potential causality, then day-of-time or monotonic clocks (commonly available in a computing device) are not appropriate due to imperfections such as
Leslie Lamport invented Lamport clocks, which can be implemented on each participating process/node using a simple integer that will be tagged with each event (local event or network sends and receives).
A logical clock is associated with each process and it starts with an initial value (say, 0). As any event happens locally on a process, the clock value is incremented by 1 and assigned to the event. So the first rule is that:
The clock value of event
should be less than the clock value of event if event happens before event on the same process.
The processes can only communicate with each other via network messages. The reception of a message happens after the sending of the message. Therefore, when a process sends a network message, it assigns a local clock value to this sending event and sends it along with the message. On the reception of such a message, the receiving process gets the max of the local clock value and the clock value received in the incoming message. Then, the receiving process adds one to the max value and assigns that value to the received message event. So our second condition is that:
If process
sends a network message to process , then the clock value associated with the sending of the message should be less than the reception of the same message.
In the following illustration, we show three processes on three separate nodes that can only communicate via messages. The dots on the timeline represent the events happening in a process. The first part of the tag (such as
Whenever an event
happens before an event , the clock value associated with is always greater than event .
Those events that are not associated with any potential causality (meaning they are neither on the same process nor is there any connection due to a network message), such events are called concurrent events. For such cases we can’t say if event
Note: The clock value of event 1 on process R (1) is less than the clock value associated with event 3 on process P (3) but R1 still didn’t happen before P3.
Points to ponder
If a storage service receives two write requests from two different processes, and , with the clock value 1 and 10, respectively, then can the service infer that 's event happened before 's event? For this reason, would the service decide to do ’s write first, then ’s?
The above questions' discussion might convince us that the plain Lamport clocks are not very useful by themselves. However, we will now see an example where we construct a distributed synchronizer where the processes send messages to all the participants so that clock value inference can be made.
Let’s assume there are
Each process also maintains a queue of requests. A command with logical timestamp
Our algorithm needs active participation from all processes. That means failure of a single process or persistent network failure can halt the whole system. More sophisticated algorithms are needed to deal with failures, though, those algorithms are out of the scope of our current blog. We will refer interested readers to
Let’s see a simple example involving just two processes.
Let’s assume a key-value store is replicated at two remote sites. Process get
and put
requests on the key-value store. In the following illustration, both processes concurrently try to change the value of the same key
Note that in a system of
Note: Lamport clocks are further improved in vector clocks, where we are able to establish two-way implication on the happened-before relationship. Each process maintains a vector of clock values—one entry for each process—and appends that vector in each message. As the number of participants increases in the system, the length of the vector increases, and such data consumes network bandwidth. Further optimizations are possible in a variant of vector clocks called dotted vector clocks.
There are a few shortcomings in the logical clocks that programmers should be aware of.
If our system cannot record out-of-band events, it can not infer potential causality for them. For example, if we order food on an app, but then call the restaurant to make a change in our order, the system is unable to see that the action taken by a person at the restaurant happened after the actions taken during the original order placing.
Programmers need to devise a strategy to deal with the concurrent events.
Programmers also need to devise how to infer true causality from the happened-before relationship provided by logical clocks.
Let’s return to our original example of recipe sharing between friends and see how using logical clocks helps us out. Here we will use vector clocks. A vector like [0,0,0] means that the first index is Alice’s clock value, the second one is Sally’s, and the last one is Bob’s. Now each message will be tagged with such clock values. Additionally, each message should reach every participant (for example, dashed lines
Note that at tag
Our real-world interactions are strongly connected with the time of occurrence and how we see one thing happening after the other, possibly in cause-effect relationships. Logical clocks are a simple but powerful way to capture such potential causality in a networked system of processes. Many data stores use the causal consistency model to provide an intuitive order of operations on data.
The logical clocks and associated state machines assumed a non-faulty system in this blog. To learn how we deal with failing nodes and networks, see the “State Machine Replication” chapter in our course Grokking the Principles and Practices of Advanced System Design.
Free Resources