Background of Distributed Cache
Learn the fundamentals for designing a distributed cache.
The main goal of this chapter is to design a distributed cache. To achieve this goal, we should have substantial background knowledge, mainly on different reading and writing techniques. This lesson will help us build that background knowledge. Let’s look at the structure of this lesson in the table below
Structure of This Lesson
Section | Motivation |
Writing policies | Data is written to cache and databases. The order in which data writing happens has performance implications. We’ll discuss various writing policies to help decide which writing policy would be suitable for the distributed cache we want to design. |
Eviction policies | Since the cache is built on limited storage (RAM), we ideally want to keep the most frequently accessed data in the cache. Therefore, we’ll discuss different eviction policies to replace less frequently accessed data with most frequently accessed data. |
Cache invalidation | Certain cached data may get outdated. We’ll discuss different invalidation methods to remove stale or outdated entries from the cache in this section. |
Storage mechanism | A distributed storage has many servers. We’ll discuss important design considerations, such as which cache entry should be stored in which server and what data structure to use for storage. |
Cache client | A cache server stores cache entries, but a cache client calls the cache server to request data. We’ll discuss the details of a cache client library in this section. |
Writing policies
Often, cache stores a copy (or part) of data, which is persistently stored in a data store. When we store data to the data store, some important questions arise:
- Where do we store the data first? Database or cache?
- What will be the implication of each strategy for consistency models?
The short answer is, it depends on the application requirements. Let’s look at the details of different writing policies to understand the concept better:
- Write-through cache: The write-through mechanism writes on the cache as well as on the database. Writing on both storages can happen concurrently or one after the other. This increases the write latency but ensures strong consistency between the database and the cache.
- Write-back cache: In the write-back cache mechanism, the data is first written to the cache and asynchronously written to the database. Although the cache has updated data, inconsistency is inevitable in scenarios where a client reads stale data from the database. However, systems using this strategy will have small writing latency.
- Write-around cache: This strategy involves writing data to the database only. Later, when a read is triggered for the data, it’s written to cache after a cache miss. The database will have updated data, but such a strategy isn’t favorable for reading recently updated data.
Quiz
A system wants to write data and promptly read it back. At the same time, we want consistency between the cache and database. Which writing policy is the optimal choice?
Write-through cache
Write-around cache
Write-back cache
Eviction policies
One of the main ...