In computing terms, a cache (pronounced cash) is hardware or software that is used to temporarily store data so that it can be accessed quickly. This data is either frequently accessed by a client or likely to be accessed in the near future.
Caches are usually very small in size so that they are cost-effective and efficient. They are frequently used by cache clients such as web-browsers, the CPU, operating systems, and DNS (to name a few). Accessing data from a cache is much faster than accessing it from the main memory or any other type of storage.
Let’s assume that a cache client wants to access some data. First, the client will check if the data is stored in the cache. If the requested data is found in the cache, it will immediately be returned to the client. This is known as a cache hit.
However, if the data is not stored in the cache, a cache miss occurs. In such instances, the client then fetches the data from the main memory and stores it in the cache. The mechanism for storing data in the cache depends on the caching algorithm and policies used.
The following illustration demonstrates how a cache hit in a system cache works:
If the data at address 25 was not cached, it would need to be fetched from the main memory. The data would then be inserted into the cache. If the cache has space, the data will be inserted easily. However, if the cache is already full, then some data will be evicted. What gets evicted, and why, depends on the eviction policy used. Some commonly used cache eviction policies are:
The illustration below shows how a cache miss works:
Accessing data from an L1 CPU cache is around 100 times faster than accessing it from RAM.