The primary challenge in achieving consistency in System Design is to maintain consistency across multiple nodes while ensuring low latency, reliability, operations during network failures, and concurrent operations.
Imagine you’re shopping online for a product you’ve been eyeing for weeks. You find the item, add it to your cart, and proceed to checkout—only to be told it’s out of stock, even though the product page showed it was available moments ago. Frustrating, right? Imagine this scenario playing out across millions of users on a global e-commerce platform. Such inconsistencies could cause confusion, loss of sales, and damage to the company’s reputation.
This is where consistency in System Design comes into play. It’s the backbone of ensuring that the data remains accurate and up-to-date no matter where or when you access the system. Without it, businesses risk losing customer trust and revenue. This tech blog will explore how consistency models address these challenges and maintain harmony across distributed systems.
Key takeaways:
Why achieving consistency in System Design is important
Understand the following four types of consistencies and their use cases:
Eventual consistency
Causal consistency
Sequential consistency
Strict consistency
How Amazon DynamoDB and Google Spanner achieve consistency
The challenges involved in each consistency model and their mitigation strategies
Let’s see what consistency actually is in System Design:
A well-designed system employs consistency mechanisms to ensure data is updated and replicated consistently across all servers, providing a seamless user experience. Consistency is essential because it provides the following:
Data integrity: Consistency ensures that data is accurate and free from errors on all nodes.
Concurrency control: Consistency mechanisms are essential in distributed systems to manage concurrent access to shared data, prevent conflicts, and ensure that changes are applied in a coordinated manner.
Application correctness: Consistency also guarantees that applications can make correct decisions based on accurate and consistent data.
Data replication: Consistency ensures efficient data replication and distribution across multiple nodes.
User experience: Another crucial point is that it provides a consistent and reliable user experience.
The spectrum of consistency models is vast and includes many types of consistency, each with its guarantees and trade-offs. For the sake of this blog, we’ll consider the following consistency types:
Eventual consistency
Causal consistency
Sequential consistency
Strong or strict consistency
Let’s explore each consistency type!
Eventual consistency means that while data may be temporarily different across servers or replicas, it will become the same after some time. For example, a social media post might not appear instantly for everyone but eventually shows up everywhere.
As shown in the following illustrations, the primary node has updated the data and acknowledged it to the clients in the interest of eventual consistency. However, the changes have not yet been propagated to the replicas. In the meantime, the read operation served by the replicas A and B will result in stale data.
Following are a couple of real-world applications of eventual consistency in the System Design domain. Let’s explain each:
Social media post updates: Imagine a social media platform where users can post updates. When a user posts a new update, it might not be immediately visible to all other users. However, the update will propagate through the system after a short delay and become visible to everyone. This is an example of eventual consistency, as the data (the user’s post) eventually becomes consistent across all nodes.
Distributed file system: Similarly, in a distributed file system, when a file is updated on one node, it might take some time for the changes to be replicated to other nodes. During this replication process, there might be temporary inconsistencies between the different copies of the file. However, all nodes will eventually have the same updated version of the file, ensuring eventual consistency.
Let’s look at the advantages and disadvantages of eventual consistency in the following table:
Advantages | Disadvantages |
|
|
System Design case study: Amazon DynamoDB
Amazon DynamoDB is a NoSQL database known for its high availability, scalability, and low latency. A main feature of the DynamoDB is its use of eventual consistency which allows DynamoDB to achieve high throughput and minimize latency, especially in scenarios involving high write volumes or distributed systems spread across multiple regions. By allowing asynchronous replication and deferred synchronization, DynamoDB is able to handle workloads where immediate accuracy is less critical, such as recommendation engines, social media feeds, or IoT sensor data.
In distributed systems, data changes can occur concurrently, which can cause conflicting changes using eventual consistency, leading to data inconsistencies. To resolve conflicting changes, systems often use methods like last-write-wins (where the most recent change is kept) or version vectors (to track and resolve different changes).
Similarly, another issue could be that the user might read stale data before the updates propagate to other replicas or nodes. The stale data reads can be mitigated by employing techniques such as read repair, quorum reads, and client-side caching with expiration.
Another issue in eventual consistency is the slow convergence of the data. Propagating updates to different replicas takes time, which delays the system’s consistency. The convergence process can be accelerated using the gossip protocol and by employing efficient replication strategies.
As we have seen, the eventual consistency causes temporary inconsistencies in the data. In the following section, let’s discuss a stronger consistency than eventual consistency, ensuring that the nodes always see the operations with causal relationships in the same order.
Causal consistency ensures that actions that depend on each other are seen in the correct order. For example, if you comment on a friend’s post, others will see your friend’s post before they see your comment—because your comment relies on the post being there.
Let’s understand causally related operations with the help of an example. In the following illustration, Process 1 writes a value a
at location x
. For Process 2 to write the value b
at location y
, it first needs to calculate b
. Because b=x+5
, the read operation on x
should be performed before writing b
on location y
. That’s why read(x, a)
and write(y, b)
are causally related.
Let’s discuss some real-world applications of causal consistency in System Design.
A few real-world applications of causal consistency in System Design are:
Comments on a social media post: In a social media platform like Instagram, causal consistency ensures that if User A posts a status update and another User B comments on it, any other User C who sees the comment on the post will also see the original status update first.
Chat application: In a chat application like WhatsApp, messages are sent and received by different users. Causal consistency ensures that messages are delivered in the correct order based on the causal relationship between events. For example, if User A sends a message to User B, and then User B replies, User C should receive the messages in the same order (A’s message followed by B’s reply), regardless of the physical network delays. This ensures that the conversation flow is maintained correctly.
Let’s look at the advantages and disadvantages of causal consistency in the following table:
Advantages | Disadvantages |
|
|
A major challenge in causal consistency is determining causally related operations. This can be resolved using logical clocks, or version vectors across different nodes. The metadata attached to the operation should be used to record their causal history, which can determine the correct order of operations.
Further, in causal consistency, finding the causally related operations can cause additional overhead in terms of storage and network bandwidth. To minimize the overhead, the system needs to optimize the size and structure of operations metadata. Propagating only the necessary operations’ metadata can also reduce this overhead.
Similarly, another issue is ensuring that all the nodes in a large-scale distributed system see the operations in the same order can be difficult. Spreading the causal information quickly can be achieved by using the gossip protocol and implementing a causal broadcast mechanism that ensures messages are delivered in causal order across the system.
Another challenge in causal consistency is ordering operations, which can be complicated due to concurrent non-causal operations from different nodes. Operations can be processed concurrently as long as they do not interfere with the causally related operations. This challenge can be resolved by employing CRDTs (conflict-free replicated data types) to handle concurrent updates without conflicts.
Sequential consistency is stricter than causal consistency. It guarantees that all operations execute in the same order for every process. This execution order is consistent across all nodes or processes in the system, even if some updates take longer to reach a certain state. Think of it as everyone watching a live video on YouTube. Some may experience a small delay, yet they all see the live video in the same sequence.
Let’s consider the following scenario to understand sequential consistency:
Assume we have a shared variable x
among processes P1 and P2, which perform the following operations on x
.
Process P1 performs:
Write x = 10
Read x
Process P2 performs:
Write x = 20
Read x
The processes P1 and P2 execute the operations independently. In a sequential consistent system, all processes must agree on a single global order of these operations. So, one possible sequence that maintains sequential consistency is the following:
P1 writes x = 10
P2 writes x = 20
P1 reads x
and gets 20
P2 reads x
and gets 20
In the above sequence, the global order suggests that the write operations happened first, so both processes P1 and P2 agree that the final value of x
is 20
.
Point to Ponder
How is sequential consistency different than causal consistency?
Following are a couple of examples of real-world applications of sequential consistency in System Design.
Collaborative editing tools (Google Docs): Sequential consistency can be seen in applications of collaborative editing tools such as Google Docs. Sequential consistency in such applications ensures that changes made by multiple users in a document reflect in the same order across all devices to avoid inconsistency and maintain integrity.
Blob storage: Sequential consistency can also be used in distributed blob storage, such as Amazon S3. It ensures that all clients see file operations in the same order, even if the actual order of operations differs.
This course will help you confidently prepare for advanced system design interview questions—in less than 5 hours. As foundational systems underpin our modern software landscape, the case studies in this course will prepare you with actionable knowledge to design and evaluate any distributed system. This course streamlines system design interview prep by providing you an understanding of real-world distributed systems. You’ll learn about systems from hyperscalers, such as Google File System (GFS) and Amazon DynamoDB. Next, you will make the most out of your interview prep time by learning to build 9 seminal systems (with each design problem taking only 15–30 minutes). All the while, you’ll learn and apply principles of modern system design as you see them implemented in real-world case studies. After taking this course, you’ll be able to confidently approach design problems in your next system design interview.
Let’s look at the advantages and disadvantages of sequential consistency in the following table:
Advantages | Disadvantages |
|
|
An important challenge in sequential consistency is the global ordering of operations. Ensuring that all nodes see operations in the same global order in a large-scale distributed system is hard. To overcome this challenge, we can use a central coordinator to enforce a global order of operations or implement consensus algorithms like Paxos or Raft, which allow distributed nodes to agree on operation orders even during failures or partitions.
Another challenge is that sequential consistency impacts performance and latency due to the need for coordination across nodes. To improve performance, the system should allow parallel operations with validation before committing changes and use local caching with periodic synchronization to reduce coordination overhead. Similarly, for improving scalability, dividing databases into shards, where each shard maintains its sequential consistency, can reduce bottlenecks. Hierarchical coordinators can also manage consistency locally and across groups.
In sequential consistency, handling concurrent writes requires consistent ordering across all nodes. Assigning timestamps to operations or using quorum-based approaches can ensure nodes agree on the order before applying changes.
Strict consistency (or linearizability) is the strongest consistency model that ensures you always get the most up-to-date information, no matter where or when you ask for it. For example, after you withdraw money from an ATM, the balance you see will reflect that withdrawal immediately, no matter which bank’s ATM you check it on.
Strict consistency is challenging to achieve. Some of the challenges include variable network delays and failures.
A few real-world applications of strict consistency in System Design can be email password updates and financial transaction updates. Let’s expand on it:
Email password update: Updating an account’s password requires strict consistency. For example, if we suspect suspicious activity on our email account, we immediately change our password so that no unauthorized users can access our account. Changing passwords would be a useless security strategy if it were possible to access our account using an old password due to a lack of strict consistency.
Financial transaction update: The banking system requires strict consistency to ensure that transactions are instantly and accurately updated across all platforms. For example, if a customer withdraws cash via an ATM, the changes in account balance should immediately reflect them. This prevents overdrafts and maintains transaction integrity.
Let’s look at the advantages and disadvantages of strict consistency in the following table:
Advantages | Disadvantages |
|
|
System Design case study: Google Spanner
Google Spanner is a globally distributed database designed to provide strong consistency, horizontal scalability, and high availability across multiple regions. Unlike many distributed databases that choose availability over consistency (following the CAP theorem), Spanner takes a unique approach by offering true global consistency with low-latency performance through a combination of innovations like the TrueTime API and synchronized clocks. TrueTime is Google’s distributed clock synchronization mechanism that enables Spanner to globally timestamp transactions accurately, ensuring consistency across geographically distributed nodes. This allows Spanner to offer strong guarantees, such as serializability for transactions, making it suitable for applications requiring high consistency, like financial services, inventory systems, and complex transaction-based applications.
Strict consistency in the distributed system causes some challenges due to the strong guarantees it provides. For example, guaranteeing that all operations appear to occur simultaneously across all nodes can introduce latency, particularly in geographically distributed systems. The latency can be improved by deploying nodes closer together and utilizing edge computing. Similarly, requiring consensus among all nodes before committing operations can limit scalability, especially as the number of nodes grows. To resolve the issue, we can divide the system into different shards to reduce the load on the coordination node.
Maintaining strict consistency can lead to unavailability in network partitions because the system might need to wait for all partitions to be reachable before proceeding with operations. Implementing partition detection and recovery mechanisms preserves strict consistency during network partitions. Quorum-based techniques enable operations to continue even when a majority of nodes agree, ensuring system resilience.
Strict consistency can also reduce availability, particularly under the CAP theorem. To ensure availability during peak loads or network disruptions, we design the system for graceful degradation, where write operations can be paused. We also implement eventual consistency as a fallback mechanism, relax strict consistency during peak loads, and revert to it when conditions stabilize.
Note: There are many consistency models other than the four discussed in this blog, and there’s still room for new consistency models. Researchers have developed new consistency models. For example,
, proposed the “causal+consistency model” to speed up some specific types of transactions. Wyatt Lloyd, et al. https://www.cs.princeton.edu/~wlloyd/papers/cops-sosp11.pdf
Apart from the challenges in each consistency model, there are some trade-offs between them. Let’s explore them!
In System Design, the CAP theorem explains that in a distributed system, you can only have two out of these three features: consistency, availability, and partition tolerance (the system should work even if parts of it can’t communicate). You can’t have all three at the same time. Prioritizing strict consistency can lead to temporary unavailability, especially during network issues. This is crucial in applications like financial systems where data accuracy is paramount. However, it might impact system responsiveness. Conversely, choosing eventual consistency prioritizes availability, allowing the system to remain responsive even if data may not be immediately consistent. This is suitable for applications like social media or online gaming.
The choice between different consistencies depends on the system’s specific requirements, such as the tolerance for data inconsistencies and the need for high availability.
Let’s consider how to choose the right consistency models in different scenarios:
Prioritizing a consistency model depends on the application’s specific requirements, such as availability and consistency, and the trade-offs we’re willing to make. Let’s see which type of consistency is important in what kind of applications in the following table:
Consistency Type | Suitability and Use Cases |
Eventual consistency | It’s suitable for applications that can tolerate temporary inconsistencies, like social media platforms, where updates propagate gradually but eventually achieve consistency. |
Causal consistency | It ensures that causally related operations are seen in the same order by all nodes, making it a good fit for chat applications where the order of messages matters. Examples include WhatsApp, Facebook Messenger, and Telegram. |
Sequential consistency | It guarantees that operations appear in a single, consistent order across all nodes, which is useful for applications requiring a more strict ordering but can handle some latency, such as document editing applications like Google Docs. |
Strict consistency (linearizability) | It offers the strongest guarantee by ensuring that all operations appear instantaneously at some point between their start and end times. This is critical for real-time systems and databases where absolute correctness is essential, such as financial and real-time trading systems. |
Selecting the appropriate consistency model in distributed systems is essential for balancing performance, availability, and correctness. Each model, from eventual to strict consistency, offers different trade-offs that need to be balanced according to a use case. Eventual consistency prioritizes availability, while causal consistency adds order to dependent events. Sequential consistency ensures a consistent view across processes and all the replicas, and strict consistency provides the strongest guarantees with a cost of low latency and availability. Understanding these models allows us to make design choices that align with their specific use cases.
The following courses on the Educative platform will help you expand your knowledge of System Design and distributed systems concepts:
Free Resources