Evaluation of a Distributed Cache's Design

Let's evaluate our design in the context of our requirements.

Let’s evaluate our proposed design according to the design requirements.

High performance

Here are some design choices we made that will contribute to overall good performance:

  • We used consistent hashing. Finding a key under this algorithm requires a time complexity of O(log(N)), where N represents the number of cache shards.
  • Inside a cache server, keys are located using hash tables that require constant time on average.
  • The LRU eviction approach uses a constant time to access and update cache entries in a doubly linked list.
  • The communication between cache clients and servers is done through TCP and UDP protocols, which is also very fast.
  • Since we added more replicas, these can reduce the performance penalties that we have to face if there’s a high request load on a single machine.
  • An important feature of the design is adding, retrieving, and serving data from the RAM. Therefore, the latency to perform these operations is quite low.

Note: A critical parameter for high performance is the selection of the eviction algorithm because the number of cache hits and misses depends on it. The higher the cache hit rate, the better the performance.

To get an idea of how important the eviction algorithm is, let’s assume the following:

  • Cache hit service time (99.9th99.9^{th} percentile): 5 ms
  • Cache miss service time (99.9th99.9^{th} percentile): 30 ms (this includes time to get the data from the database and set the cache)

Let’s assume we have a 10% cache miss rate using the most frequently used (MFU) algorithm, whereas we have a 5% cache miss rate using the LRU algorithm. Then, we use the following formula:

EATEAT = RatiohitRatio_{hit} x TimehitTime_{hit} ++ RatiomissRatio_{miss} x TimemissTime_{miss}

Here, this means the following:

EATEAT: Effective access time.

RatiohitRatio_{hit}: The percentage of times a cache hit will occur.

RatiomissRatio_{miss}: The percentage of times a cache miss will occur.

TimehitTime_{hit}: Time required to serve a cache hit.

TimemissTime_{miss}: Time required to serve a cache miss.

For MFU, we see the following:

EAT = 0.90 x 5 milliseconds + 0.10 x 30 milliseconds = 0.0045 + 0.003 = 0.0075 = 7.5 milliseconds

For LRU, we see the following:

EAT = 0.95 x 5 milliseconds + 0.05 x 30 milliseconds =  0.00475 + 0.0015 = 0.00625 = 6.25 milliseconds.

The numbers above highlight the importance of the eviction algorithm to increase the cache hit rate. Each application should conduct an empirical study to determine the eviction algorithm that gives better results for a specific workload.

Scalability

We can create shards based on requirements and changing server loads. While we add new cache servers to the cluster, we also have to do a limited number of rehash computations, thanks to consistent hashing.

Adding replicas reduces the load on hot shards. Another way to handle the hotkeys problem is to do further sharding within the range of those keys. Although the scenario where a single key will become hot is rare, it’s possible for the cache client to devise solutions to avoid the single hotkey contention issue. For example, cache clients can intelligently avoid such a situation that a single key becomes a bottleneck, or we can use dynamic replication for specific keys, and so on. Nonetheless, the solutions are complex and beyond the scope of this lesson.