Draft Lesson

What are tiny objects?

where they might be seen? Why need them?

so kangaroo we’re focusing on caching tiny objects and the reason we’re focused on tiny objects is because they’re prevalent and they’re an underserved use case and so where do we see tiny objects well tiny objects are apparent in the social graph such as facebook social graph which has edges that average around 100 bytes that connect any friends or any posts people make we also see tiny objects in iot metadata for example at microsoft azure all sensor metadata averages around 300 bytes and then we also see in things like tweets or other text data for instance twitter tweets average less than 33 characters on average

Why tiny objects are needed?

so with all these tiny objects at massive scales we want to be able to serve them at scale to various applications and one of the ways we do this is by caching massive amounts of mass amounts of data and so when an application wants a tiny object what will typically happen it will send that request to a caching layer the caching layer will return the object if it’s a hit has a hit otherwise on a cache miss the request will be sent back to database layer and so the caches here have two main goals they want to lower the average latency of the entire service and they also want to keep load off the backend services and to really have a caching layer that’s effective at scale these caches need to be really big and one way to make them really big is to use flash because it’s 100 times cheaper per bit so you can have much larger caches for the same cost and so at scale a lot of uh companies deploy flash caches

so with kangaroo we’re really trying to solve this problem where we need to cache billions of tiny objects on flash and if you look at the prior work of caching tiny objects on flash it they either have this problem of having too many flash rights or a large memory overhead in either way you’re going to waste money on your cash so kangaroo allows us to solve this problem while reducing misses and we reduce misses by 29 over uh comparisons and while keeping rights and memory pressure under uh production constraints kangaroo is open source and it’s integrated into cache lib which is a facebook’s caching engine that they use in production and is open source and can be found at cashlique.org

Caching on flash

now that i’ve introduced the problem that kanger is trying to solve and a little bit about what kangaroo does i’m going to be talking about the challenges of caching on flash then i’ll be moving on to how we can cache on flash while minimizing dram overhead which will lead us into kangaroo’s design and finally the results

so when we think about caching on flash um we have all the challenges that you have with caching on dram but we have additional additional challenges and so flash allows us to have cheaper caches but flash devices also have this problem having limited light endurance there’s only so many times you can write data to flash uh before that flash device will wear out and no longer work and since uh caches are constantly changing what data is in the cache this right endurance becomes a really significant factor that we have to take into account when building caches in addition flash caches have to write in at least four kilobyte blocks because this is the minimum read write granularity of a flash device and this right granularity is bigger than the objects that we’re actually looking at and so to solve both of these problems kind of in conjunction most flash caches use the log structured cache

Log structured cache

Log structure cache is similar to a log structure file system or another log structured uh based store um we’ll take an object and buffer it in dram and this is so that we can have one really large right to flash so once your buffer is full

we write the entire group of objects to a segment in the log and place enough metadata in a log index to be able to find them again since objects can end up anywhere on flashes in our circular log eventually these objects will have to be evicted um and usually they’ll be evicted all together as a segment as they’re written in

and so the big advantage about this of using a log signature cache is that we’re buffering our rights um in dram so that we have a really small overhead when we’re writing them to the log and so our buffered rights minimize the number of rights to flash and we don’t wear through our flash devices quickly the downside is that there’s a full in-memory index since an object can end up anywhere in the circular log on flash and so this full in memory index becomes a large problem when you have tiny objects

...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy

Introduction

Abstractions

Non-functional System Characteristics

Back-of-the-Envelope Calculations

Building Blocks

Domain Name System (DNS)

Sequencer

Rate Limiter

Distributed Cache

Blob Store

Content Delivery Network (CDN)

Load Balancers

Key-Value Store

Distributed Messaging Queue

Pub-sub

Distributed Task Scheduler

Distributed Search

Distributed Logging

Distributed Monitoring

Monitoring Server Side Errors

Monitoring Client Side Errors

Databases

Sharded Counters

Concluding Building Blocks

Design YouTube

Design Quora

Design Google Maps

Designing a Proximity Server like Yelp

Design Uber

Design Twitter

Newsfeed System

Design Instagram

Design URL Shortening Service / TinyURL

Design a Web Crawler

Design WhatsApp

Design Typeahead Suggestion

Design Collaborative Document Editing Service / Google Docs

Spectacular Failures

Concluding Remarks

Appendix: System Design Interviews

All content below this will likely go away

Design Exercises

Archived temporary lessons

Design Resource Allocator for a Large Datacenter

Design Zoom

Continuous Monitoring using Data Processing

Design Live Commenting at Facebook

Security

For Noor: Placeholder for Illustration Making

Appendix

Backup of our Lessons

Caching Billions of Tiny Objects on Flash

Design Quora

Copy-Design YouTube

Identity & Access Management

Copy of CDN (02-03-2022)

Draft Lesson

What are tiny objects?

Why tiny objects are needed?

Caching on flash

Log structured cache