Draft Lesson
What are tiny objects?
where they might be seen? Why need them?
so kangaroo we’re focusing on caching tiny objects and the reason we’re focused on tiny objects is because they’re prevalent and they’re an underserved use case and so where do we see tiny objects well tiny objects are apparent in the social graph such as facebook social graph which has edges that average around 100 bytes that connect any friends or any posts people make we also see tiny objects in iot metadata for example at microsoft azure all sensor metadata averages around 300 bytes and then we also see in things like tweets or other text data for instance twitter tweets average less than 33 characters on average
Why tiny objects are needed?
so with all these tiny objects at massive scales we want to be able to serve them at scale to various applications and one of the ways we do this is by caching massive amounts of mass amounts of data and so when an application wants a tiny object what will typically happen it will send that request to a caching layer the caching layer will return the object if it’s a hit has a hit otherwise on a cache miss the request will be sent back to database layer and so the caches here have two main goals they want to lower the average latency of the entire service and they also want to keep load off the backend services and to really have a caching layer that’s effective at scale these caches need to be really big and one way to make them really big is to use flash because it’s 100 times cheaper per bit so you can have much larger caches for the same cost and so at scale a lot of uh companies deploy flash caches
so with kangaroo we’re really trying to solve this problem where we need to cache billions of tiny objects on flash and if you look at the prior work of caching tiny objects on flash it they either have this problem of having too many flash rights or a large memory overhead in either way you’re going to waste money on your cash so kangaroo allows us to solve this problem while reducing misses and we reduce misses by 29 over uh comparisons and while keeping rights and memory pressure under uh production constraints kangaroo is open source and it’s integrated into cache lib which is a facebook’s caching engine that they use in production and is open source and can be found at cashlique.org
Caching on flash
now that i’ve introduced the problem that kanger is trying to solve and a little bit about what kangaroo does i’m going to be talking about the challenges of caching on flash then i’ll be moving on to how we can cache on flash while minimizing dram overhead which will lead us into kangaroo’s design and finally the results
so when we think about caching on flash um we have all the challenges that you have with caching on dram but we have additional additional challenges and so flash allows us to have cheaper caches but flash devices also have this problem having limited light endurance there’s only so many times you can write data to flash uh before that flash device will wear out and no longer work and since uh caches are constantly changing what data is in the cache this right endurance becomes a really significant factor that we have to take into account when building caches in addition flash caches have to write in at least four kilobyte blocks because this is the minimum read write granularity of a flash device and this right granularity is bigger than the objects that we’re actually looking at and so to solve both of these problems kind of in conjunction most flash caches use the log structured cache
Log structured cache
Log structure cache is similar to a
log structure file system or another log
structured uh based store
um we’ll take an object and buffer it in
dram
and this is so that we can have one
really large right to flash so once your
buffer is full
we write the entire group of objects to a segment in the log and place enough metadata in a log index to be able to find them again since objects can end up anywhere on flashes in our circular log eventually these objects will have to be evicted um and usually they’ll be evicted all together as a segment as they’re written in
and so the big advantage about this of using a log signature cache is that we’re buffering our rights um in dram so that we have a really small overhead when we’re writing them to the log and so our buffered rights minimize the number of rights to flash and we don’t wear through our flash devices quickly the downside is that there’s a full in-memory index since an object can end up anywhere in the circular log on flash and so this full in memory index becomes a large problem when you have tiny objects
...
Create a free account to access the full course.
By signing up, you agree to Educative's Terms of Service and Privacy Policy