What is a blob store?

Blob store is a binary object store that lets developers store unstructured data in key-value pairs in the cloud. Blobs are grouped into containers that are tied to user accounts. Each bucket is like a new database, with keys being your folder path and values being the binary objects (files). The data can be accessed from anywhere in the world and can include audio, video, and text.

Example: Amazon S3

We will focus on the design of the Amazon S3.

Amazon S3: Requirements & challenges

Functional requirements

Multitenancy
- Multiple people can create multiple accounts and upload their files into the system
- We don’t want our system to be separately deployed for every person or every account. The same system should be available for every customer.
- The users should be able to view all their files on their respective consoles.
- A single system should handle all the customers.
Virtual hosting style access of the files or the data
Path hosting style access of the files or the data

Non functional requirements

Durability (99.999 %): highly durable. We want our system not to lose the user’s files.
Availability (99.99 %): we want our system to be highly available.
Scalability: The system should scale with the increasing amount of data uploads so that customers can keep on uploading the data.
Region-specific buckets
- Allow the users to create a bucket in a specific region and upload files to that bucket.
- Allow the users to access the bucket from the same region.
Security: SSL, provide secure layer of access

Distributed systems design principles used to meet S3 requirements

Decentralization
- to remove scaling bottlenecks
- to avoid single points of failure
Asynchrony
- To let the system make progress under all circumstances
Autonomy
- To add independence between the system components to make decisions based on its local information.
Local responsibility
- Each component is responsible for achieving its consistency; this is never the burden of its peers.
Controlled concurrency
- Operations are designed such that no or limited concurrency control is required.
Failure tolerant
- The system considers the failure of components to be a normal mode of operation and continues operation with no or minimal interruption.
Controlled parallelism
- Abstractions used in the system are of such granularity that parallelism can be used to improve the performance and robustness of recovery or the introduction of new nodes.
Decompose into small well-understood building blocks
- Do not try to provide a single service that does everything for everyone, but instead build small components that can be used as building blocks for other services.
Symmetry
- Nodes in the system are identical in terms of functionality and require no or minimal node-specific configuration to function.
Simplicity
- The system should be made as simple as possible (- but no simpler).

Problem with the horizontally scaled blob store

Horizontal scaling for the simplest blob store is not a reliable solution because It’s not necessary that the file is being accessed from the same server where it was uploaded.

For example, It might be possible that a client’s create bucket request is handled by server 1 and the upload files request is handled by server 2, server 2 wouldn’t find the bucket for that client in server 2’s storage, and couldn’t upload the files. Similarly, if the files were successfully uploaded in case ...

Introduction

Abstractions

Non-functional System Characteristics

Back-of-the-Envelope Calculations

Building Blocks

Domain Name System (DNS)

Sequencer

Rate Limiter

Distributed Cache

Blob Store

Content Delivery Network (CDN)

Load Balancers

Key-Value Store

Distributed Messaging Queue

Pub-sub

Distributed Task Scheduler

Distributed Search

Distributed Logging

Distributed Monitoring

Monitoring Server Side Errors

Monitoring Client Side Errors

Databases

Sharded Counters

Concluding Building Blocks

Design YouTube

Design Quora

Design Google Maps

Designing a Proximity Server like Yelp

Design Uber

Design Twitter

Newsfeed System

Design Instagram

Design URL Shortening Service / TinyURL

Design a Web Crawler

Design WhatsApp

Design Typeahead Suggestion

Design Collaborative Document Editing Service / Google Docs

Spectacular Failures

Concluding Remarks

Appendix: System Design Interviews

All content below this will likely go away

Design Exercises

Archived temporary lessons

Design Resource Allocator for a Large Datacenter

Design Zoom

Continuous Monitoring using Data Processing

Design Live Commenting at Facebook

Security

For Noor: Placeholder for Illustration Making

Appendix

Backup of our Lessons

Caching Billions of Tiny Objects on Flash

Design Quora

Copy-Design YouTube

Identity & Access Management

Copy of CDN (02-03-2022)

Case Study: S3

What is a blob store?

Amazon S3: Requirements & challenges

Distributed systems design principles used to meet S3 requirements

How S3 stores unstructured data?

The simplest blob store

Problem with the simplest blob store

Brute-force solution: horizontal scaling

Problem with the horizontally scaled blob store