Home/Blog/System Design/A complete guide to System Design caching
Home/Blog/System Design/A complete guide to System Design caching

A complete guide to System Design caching

Fahim ul Haq
Aug 26, 2024
17 min read
content
Caching and its benefits
Real-world applications of caching
Caching layers
Back-end layer
Load balancer caching
Web server caching
Application server caching
Database caching
Distributed caching
Network layer
DNS caching
Network proxy caching
Content delivery network (CDN) caching
ISP caching
Caching challenges and drawbacks
Conclusion
share

Caching in System Design can be one of the trickiest concepts for many software engineers to grasp. This is because most of us have limited exposure actually working on large-scale systems that deploy caching strategies. Even as someone who does have first-hand experience working on large-scale storage systems at Microsoft and Meta, I understand better than most that caching strategies can vary substantially and are highly situation-dependent. That means it can still be hard to know which approaches to deploy for different use cases — even with hands-on experience.

The good news is there are actually a few strategies you can use to identify the optimal caching approach for your application. It's important to remember that mastering caching is not just about understanding the technologies and techniques — it's about knowing how to efficiently asses the problem, break it down into its component parts, and identify the optimal solution.

Did you know that over half of mobile users abandon websites that take more than 3 seconds to load?

In this blog, we will take a step-by-step approach to understand the utility of System Design caching at different layers — with plenty of real-world examples and battle-tested tips. We'll examine the case study of a naive streaming application and demonstrate how caching at different layers can solve different problems that you may face. By the end, you’ll be able to confidently evaluate which type of caching approach to deploy for any use case.

Note: Before we dive deeper into specific case studies, we'll start with a comprehensive refresher on caching in System Design. If you want to brush up on your System Design fundamentals more broadly, I encourage you to explore the Grokking Modern System Design Interview for Engineers & Managers course.

Caching and its benefits

Caching is the process of storing copies of data in a high-speed storage layer (the cache memory) to reduce the time it takes to access this data compared to fetching it directly from the primary storage (the database).

The illustration below depicts how a cache operates on an abstract level:

A server fetching frequently accessed data from cache memory reducing response time
A server fetching frequently accessed data from cache memory reducing response time

Caching is not just about storing data in fast storage; it’s about knowing when to update or remove it.

Here are some key benefits of using caching in System Design:

  1. Reliability: Caching enhances system reliability by reducing dependency on the primary data source.

  2. Availability: Caching increases the system’s availability by providing data from the closest or least busy locations.

  3. Scalability: Caching is crucial for scaling systems by handling increased traffic and reducing the load on backend servers.

  4. Performance: Resulting in fewer read/write operations performed on the database, caching helps improve performance.

  5. Cost reduction: A single cache instance can provide thousands of I/O operations per second, significantly reducing the database’s operations.

  6. Reduced load: By offloading the read operations, the cache protects the backend from slower performance and potential crashes during peak loads.

Note: While caching offers significant benefits in many scenarios, it’s important to recognize that caching might not be essential for all systems and certainly for all layers. The necessity of caching depends on various factors, including the system’s scale, performance requirements, and data access patterns. Therefore, when designing a system, it’s essential to carefully assess whether caching is appropriate and aligns with the system’s specific needs and constraints.

Real-world applications of caching

To describe the practical implications of caching in System Design, let’s examine two contrasting scenarios:

  1. Google Search vs. Medical Bracelet: Google’s search engine relies heavily on caching to deliver instant search results and optimize performance for billions of users worldwide. In contrast, a medical bracelet containing sensitive user information might not require caching for performance reasons. Instead, the focus is on ensuring data security and access control mechanisms, which might not align with traditional caching strategies.

  2. Social Media Platforms: Social media platforms like Facebook and X (formerly Twitter) leverage caching extensively to provide a seamless user experience. However, in specialized systems with strict timing constraints, caching might not be feasible or necessary.

Caching layers

To further explore the significance of caching in System Design, let me explain its practical application through a real-world scenario.

Let’s assume you’re working on a streaming application steadily becoming popular among users. As the number of users grows, so does the application’s design. Ignoring other aspects, let’s see how caching plays a role in this design evolution. Let’s take a step-wise approach and implement caching solutions at different points in the larger System Design. To better understand the optimization problem, let’s break down the cache installation in our system into two main layers:

  • Back-end layer

  • Network layer

I’m skipping front-end caching because most designs will mainly focus on the backend and network infrastructure.

Let’s start with the addition of caching solutions on the back-end layer.

Back-end layer

Back-end caching is generally essential, but its importance lies in strategic implementation across various components. As discussed below, multilayered caching within the backend infrastructure significantly improves application responsiveness and scalability. To keep it short, I’ll assume a simplified system and mention only crucial components that significantly impact the overall design.

Load balancer caching

Let’s assume traffic surges and more user requests reach and overwhelm our backend system, leading to slower response times and potential service disruptions. Distributing traffic efficiently and reducing server load is critical to maintaining performance and reliability.

To mitigate performance bottlenecks at scale, we start with caching solutions at the load balancing layer. This approach requires storing frequently accessed, static content (e.g., HTML, CSS, JavaScript) on the load balancers themselves. By serving cached content directly to users, we can significantly reduce the number of requests that reach the servers, decrease server load, and improve response times. This concept is depicted in the illustration below:

Load balancer cache serving user requested content directly from its cache
Load balancer cache serving user requested content directly from its cache

A system receives all sorts of requests, such as:

  • CPU-bound requestsThese primarily depend on the processor of a node. An example is compressing 1 KB of data as snzip.

  • Memory-bound requestsThese are primarily bottlenecked by the memory subsystem. An example is reading 1 MB of data sequentially from the RAM of a node.

  • IO-bound requestsThese are primarily bottlenecked by the IO subsystem (such as disks or the network). An example is reading 1 MB of data sequentially from a disk.

We can’t handle this variety of requests by installing a single caching solution. Therefore, we need to expand caching solutions to other components within our design.

Web server caching

As our streaming application user base grows, we need to make more content readily available to improve user experience and reduce the upstream burden. Considering the earlier scenario, web servers would be the first component under strain after load balancer. The web server is responsible for handling requests related to web pages. Our web page templates can be served at this layer along with other static content (CSS, JavaScript, etc.) for different clients.

By caching this content for different client devices, the web server will deliver it immediately when requested without needing to regenerate or fetch it from the back-end systems. This reduces server load, decreases response times, and enhances our platform’s overall performance and scalability. The illustration below depicts this concept:

User frequent requests are served directly from the web server cache
User frequent requests are served directly from the web server cache

In general, whenever requests reach the web server, it prepares a web page with static content and forwards the requests to the application server for generating dynamic content. When the dynamic content is received from the application server, the web server populates the webpage with this dynamic content and sends it back to the user. Therefore, each server plays its role in handling a user query.

Application server caching

Moving forward, the application server struggles to meet the demand when numerous users simultaneously interact with dynamic content, such as video metadata and thumbnails, requiring frequent updates and processing. This happens because the application server runs business logic and requires significant computations to handle user queries. This results in slower response times and degraded performance.

We can achieve shorter response times and a better user experience by caching frequently computed results. The following image illustrates this concept:

Application server reducing response time by fetching data from cached memory
Application server reducing response time by fetching data from cached memory

Database caching

Our online streaming platform handles many read and write operations, particularly for user data and video metadata, which can become a bottleneck, especially under high load.

To address this, we’ll implement database caching. By storing the results of frequently executed database queries, we can reduce latency and lighten the load on the database, ensuring smoother and faster access to data. The following illustration captures this concept:

The application server fetching frequently accessed data from the database cache
The application server fetching frequently accessed data from the database cache

Note: I included a blob store in the design because it’s an essential component for storing binary files such as images/thumbnails and videos, particularly for a streaming platform. Retrieval of binary blobs is a relatively slow operation. Let’s see how we can deal with this problem in the coming sections.

Distributed caching

The previous sections show that different types of components use caching to serve different data types. However, it’s important to understand that the primary purpose of these components is not to serve readily available data but to handle user queries and perform the necessary actions requested. Even if they want to, there is a limited cache size available on these components. We, therefore, need a specialized component—the distributed cache to store all frequently accessed data of different components. These components will request the distributed cache to serve that data without worrying about caching it themselves.

A distributed caching system will store data across multiple nodes in a network, ensuring that data is available to every user regardless of location. By doing so, we reduce latency and improve consistency to provide a more reliable streaming experience. The illustration below depicts this concept:

The web, application, and database server fetching data from distributed cache
The web, application, and database server fetching data from distributed cache

To learn how to design a distributed cache that optimizes performance and ensures scalability in modern systems, check out the comprehensive chapter on Distributed Cache from the course Grokking Modern System Design Interview for Engineers & Managers.

While we’ve meticulously fine-tuned various components of our back-end layer, it’s time to shift our focus to the networking layer now.

Network layer

On the network layer, considering the current needs of the platform, we need to implement four types of caching, which are discussed below:

DNS caching

I assume you know about DNS already, or its caching, for that matter. Honestly, for most smaller systems, DNS caching at various layers is usually sufficient unless it’s a tech giant like Google serving users globally. Nevertheless, caching at this component is paramount because every time a request is made, it cannot get processed unless the IP address corresponding to a domain name is resolved, mainly figuring out where the servers that will entertain our query are located.

Note: In my opinion, DNS is the best example of a large-scale distributed system—so much so that it’s almost a mystery how it works. It’s an inspiration and a marvel for any engineer looking to learn about distributed systems.

When a user’s device first resolves our server’s domain name, it stores the IP address for a certain period. The next time the user or anyone on the same network wants to connect, the device can quickly refer to this cached information instead of resolving the domain name repeatedly. This reduces the time required to establish a connection, resulting in faster load times and a smoother streaming experience for our users. This concept is shown in the illustration below:

User request for visiting a website is directly served from the DNS cache
User request for visiting a website is directly served from the DNS cache

Here’s an example demonstrating the time to perform DNS lookups without caching. Notice the highlighted line 16.

; <<>> DiG 9.10.6 <<>> educative.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37686
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;educative.io. IN A
;; ANSWER SECTION:
educative.io. 300 IN A 104.18.2.119
educative.io. 300 IN A 104.18.3.119
;; Query time: 95 msec
;; SERVER: 192.168.0.1#53(192.168.0.1)
;; WHEN: Wed Jul 03 07:01:03 PKT 2024
;; MSG SIZE rcvd: 73

Now, let’s analyze how much time is reduced after caching:

; <<>> DiG 9.10.6 <<>> educative.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42245
;; flags: qr aa rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;educative.io. IN A
;; ANSWER SECTION:
educative.io. 0 IN A 104.18.2.119
;; Query time: 9 msec
;; SERVER: 192.168.0.1#53(192.168.0.1)
;; WHEN: Wed Jul 03 07:01:14 PKT 2024
;; MSG SIZE rcvd: 46

Notice how the time to access the website decreased after the initial lookup (line 13)? This improvement is thanks to caching, which stores frequently accessed data locally.

Network proxy caching

Network proxies sit in the middle or at the edge of client or server-side networks. Frequently requested static data stored at this layer reduces the need to fetch everything from the origin servers. Not only do they reduce the response time, but they also save outgoing and incoming network bandwidth, not to mention the burden on back-end servers. Imagine an educational institute serving its logo to thousands of users from a proxy server, reducing the load on the origin server to just handling dynamic textual data requests.

The illustration below depicts the concept:

Users request the content through any corporate or educational institution. The proxy server looks for the requested data in its cache memory and responds with the requested data.
Users request the content through any corporate or educational institution. The proxy server looks for the requested data in its cache memory and responds with the requested data.

Content delivery network (CDN) caching

Proxy servers help, but they’re like a grain of salt when it comes to data delivery for streaming applications. Considering dynamic user base growth, efficiently fetching and delivering media content to a worldwide audience is crucial, especially when handling large media files. Imagine 1,000 viewers simultaneously streaming a viral video in a densely populated area. A CDN point of presence (PoP) can deliver this content to all users in high quality without any buffering issues. The origin servers cannot handle this load, especially on a global scale. For our streaming application, CDNs will be strategically placed in different regions of the world, depending on our users.

2 (CDN) caches media content in RAMs at edge locations, bringing data closer to end users and reducing the distance the data must travel to reach the user. This approach decreases latency and response time, ensuring a faster and more reliable media delivery experience. Here’s an illustration that represents this concept:

CDN caches media content at edge locations close to end users
CDN caches media content at edge locations close to end users

Look at how CDNs bring data closer to the users by serving them directly instead of retrieving data from the (slower) blob stores.

Note: In reality, CDNs don't talk to blob stores directly. Usually, the application servers fetch data from blob stores and provide it to CDNs.

ISP caching

Question: What if the number of streamers in a densely populated area jumps from 1,000 to 100,000?

Even CDNs will not be able to handle requests for such a scale. However, we can solve the above case by implementing ISP caching, where popular content is cached locally within their networks. ISP caching brings data closer to the user so that they don’t need to leave their networks.

The illustration below depicts this concept:

 ISP first checks the requested content in its local cache and responds if the content is found
ISP first checks the requested content in its local cache and responds if the content is found

Netflix and YouTube have leveraged ISP level caching significantly. Using this approach, ISPs allow Google’s/Netflix’s servers on their premises, connect them to their network, and let Google/Netflix remotely update content on these servers to facilitate their users.

Point to Ponder

Question

We’ve added caching solutions at various points in our design. For a typical application, what do you think is the right proportion of cache sizes at these points?

Show Answer

Below, we compare caching types and their compatibility with various system architectures:

Caching Suitability Across System Architectures

System Type

Web Server Caching

Application Server Caching

Load Balancer Caching

CDN Caching

Database Caching

Distributed Caching

Network Proxy Paching

Client-Side Caching

Large-scale web app

Enterprise software

Real-time data processing

Embedded systems

IoT devices

Note: The ❓ in the table indicates that the caching at this layer is highly subjective and depends on the specific use case.

From the above table, it’s clear that caching is not universally applicable across all system types. For instance, caching is highly beneficial for large-scale web applications and enterprise software due to the need for high performance, scalability, and efficient resource usage. However, in embedded systems, caching might or might not be suitable at every layer because these systems require up-to-the-second data accuracy, and caching could introduce unacceptable delays.

While caching is a powerful tool in many scenarios, evaluating its relevance and impact on a case-by-case basis is essential to avoid unnecessary complexity and ensure optimal system performance.

Caching challenges and drawbacks

We’ve understood the role of caching at different System Design layers, and it seems beneficial. However, caching comes with its challenges that should be addressed. Let’s see some of those:

  1. One significant challenge in caching is ensuring that cached data remains fresh and up-to-date. If cached data is not periodically updated or expired correctly, it can lead to stale data being served to users. This can result in inconsistencies and inaccuracies in the user experience, especially if the underlying data or content has changed. Setting the right expiry time for cached data mitigates this risk. If the expiry time is too short, it can lead to frequent cache misses and increased server load. On the other hand, if the expiry time is too long, it increases the likelihood of serving stale data to users.

  2. Another challenge arises when cached data becomes invalid due to underlying data or content changes. For example, if the mapping of domain names to IP addresses changes or a media file is updated or replaced, the cached data might no longer be accurate. In such cases, it’s essential to implement cache invalidation and data synchronization mechanisms to ensure that users always receive the most current and accurate information.

  3. Cache consistency can be challenging, especially in distributed caching systems where data is cached across multiple nodes or layers. Maintaining consistency across caches and ensuring all nodes access the latest data version can be complex and require careful coordination.

Caching increases the complexity of the system and introduces new challenges. We need to develop strategies to mitigate them, such as setting appropriate expiry times and ensuring data consistency across caches.

Next, let’s see some of the drawbacks that are introduced by caching:

  1. Cache poisoning occurs when malicious users manipulate cached data to serve incorrect or harmful information.

  2. Implementing caching strategies requires additional infrastructure, monitoring, and management, which can significantly increase the system’s overall complexity.

  3. Maintaining cache synchronization with the primary data source can be challenging when data changes frequently. If not handled correctly, this can result in users receiving outdated or incorrect information.

  4. High traffic on the cache layer can lead to resource contention, causing bottlenecks that can negate the performance benefits caching is supposed to provide.

  5. When underlying data or content changes, it becomes crucial to invalidate the affected cached data. However, implementing effective cache invalidation and data synchronization mechanisms can be complex and require careful planning.

By understanding and addressing these drawbacks, organizations can develop a more effective caching strategy that maximizes caching’s benefits while minimizing its potential downsides.

Conclusion

While caching has its pitfalls and challenges, it’s a complete package. Using caching effectively comes down to the engineer. The focus of this blog was to give you an idea of where caching is commonplace and when to use it. Questions like “How much caching is enough?” can be answered by prototyping and testing according to specific use cases.

Finally, I must emphasize that caching is only effective if you know how to design distributed systems in the first place. If you don’t know your design well, knowing when and where to apply caching will remain a mystery for you. Caching is more of an optimization problem, and optimization applies to systems built on sound foundations. You can get hands-on practice building systems with the courses I've added below.

And as always, happy learning!