Introduction

Even though we discussed the design and its major components in detail in the previous lesson, a number of interesting questions need answering. For example, how do we store (large) blobs? In the same disk, in the same machine or divide it into chunks? How many replicas of a blob should be made to ensure reliability and availability? How to search and retrieve blobs quickly, etc.

This lesson answers such important design concerns. The table below summarizes the goals of this lesson.

Summary of the Lesson

Section	Purpose
Blob metadata	What metadata is maintained to ensure efficient storage and retrieval of blobs
Partitioning	How blobs are partitioned among different data nodes
Blob indexing	How to efficiently search for blobs
Pagination	How to conceive a method for retrieval of a limited number of blobs to ensure improved readability and loading time
Replication	How to replicate and how many copies to maintain to improve availability
Garbage collection	How to delete blobs without sacrificing performance
Streaming	How to stream large files chunk-by-chunk to facilitate interactivity for user
Caching	How to improve response time and throughput

Before we answer the questions above, let’s look at how we create layers of abstractions for the user to hide the internal complexity of the blob store. These abstraction layers help us with design decisions as well.

We have three layers:

User account: Users uniquely get identified on this layer through their account_ID. Blobs uploaded by users are maintained in their containers.
Container: Each user has a set of containers uniquely identified by a container_ID. These containers contain blobs.
Blob: This layer contains information about blobs that are uniquely identified by blob_ID. This layer maintains information about the metadata of blobs that is vital for achieving the availability and reliability of the system.

Blob metadata

When a user uploads a blob, it is split into small size chunksA chunk is the minimum unit of data for writing and reading to have support for storing large files that couldn’t fit in one contiguous location or in one data node or in one block of a disk associated with the data node. The chunks for a single blob are then stored on different data nodes that have the storage space available to store the chunks. There are billions of blobs that are being stored on the storage. The master node has to ...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy

Level	Uniquely identified by	Information	Sharded by	Mapping
User Blob Store Account	account_ID	list of containers_ID's	account_ID	account -> list of containers
Container	container_ID	list of blob ID's	container_ID	container -> list of blobs
Blob	blob_ID	{list of chunks, chunkInfo: data node ID's,.. }	blob_ID	blob -> list of chunks

Introduction

Abstractions

Non-functional System Characteristics

Back-of-the-Envelope Calculations

Building Blocks

Domain Name System (DNS)

Sequencer

Rate Limiter

Distributed Cache

Blob Store

Content Delivery Network (CDN)

Load Balancers

Key-Value Store

Distributed Messaging Queue

Pub-sub

Distributed Task Scheduler

Distributed Search

Distributed Logging

Distributed Monitoring

Monitoring Server Side Errors

Monitoring Client Side Errors

Databases

Sharded Counters

Concluding Building Blocks

Design YouTube

Design Quora

Design Google Maps

Designing a Proximity Server like Yelp

Design Uber

Design Twitter

Newsfeed System

Design Instagram

Design URL Shortening Service / TinyURL

Design a Web Crawler

Design WhatsApp

Design Typeahead Suggestion

Design Collaborative Document Editing Service / Google Docs

Spectacular Failures

Concluding Remarks

Appendix: System Design Interviews

All content below this will likely go away

Design Exercises

Archived temporary lessons

Design Resource Allocator for a Large Datacenter

Design Zoom

Continuous Monitoring using Data Processing

Design Live Commenting at Facebook

Security

For Noor: Placeholder for Illustration Making

Appendix

Backup of our Lessons

Caching Billions of Tiny Objects on Flash

Design Quora

Copy-Design YouTube

Identity & Access Management

Copy of CDN (02-03-2022)

Design Considerations in a Blob Store

Introduction

Summary of the Lesson

Layered Information

Blob metadata