...

/

Introduction to Blob Store clone

Introduction to Blob Store clone

Get an introduction to the blob store, identify the requirements and make estimations.

Blob store is a storage solution for unstructured data. We can store photos, audios, videos, binary executable codes, or other multimedia items to a blob store. Every type of data is stored as blobBlob (binary large object) consists of a collection of binary data stored as a sigle unit.. It follows a flat data organization pattern where there are no hierarchies (directories, sub-directories, etc.).

Mostly, it is used by the applications with a particular business requirement called WORM (Write Once, Read Many), which states that data can only be written once and that no one can change it. As in Microsoft Azure, the blobs are created once, read many times, can not be deleted until a specified interval, and can not be modified to protect critical data.

A blob store storing and streaming large unstructured files like audio, video, images, and documents

It isn’t necessary that all of the applications have this WORM requirement. But we are assuming that the blobs that are written can not be modified, instead, we can upload a new version of that blob if needed.

Why blob store?

Blob store is an important component of many data-intensive applications such as YouTube, Netflix, Facebook, etc. The table below displays the blob storage used by some of the well-known applications. These applications generate a huge amount of unstructured data every day. They require a storage solution that is easily scalable, reliable, and highly available where they can store large media files. As the data continuously increase, these applications need to store an unlimited number of blobs.According to some estimates, YouTube requires more than a petabyte of additional storage per day.In a realistic system like Youtube, a video is stored in multiple resolutions. Also, the video in all resolutions is replicated many times across datacenters and regions for availability purposes. That’s why the total storage required per video is not equal to the size of the uploaded video.

System

Blob store

Netflix

S3

YouTube

Google Cloud Storage

Facebook

Tectonic

Blobs can be as tiny as a few kilobytes for small images to as large as a few terabytes for an archive file of famous movies. Blobs store is optimized for larger files. For small data, we may use key-value store.

According to the AWS FAQ in 2022, S3 allows a single blob of 5 Terabytes.

Requirements

Let’s list down the functional and non-functional requirements of the system.

Functional

Following are the functional requirements of the blob store:

  • Create a containerA container is like a folder in a file system used to group blobs. Don’t mix this container with the docker container.: The users should be able to create containers in order to group blobs. For example, if an application wants to store user-specific data, it can store blobs for different user accounts in different containers. Also, a user may want to group video blobs and separate them from a group of image blobs. A single blob store user can create many containers, and each container can have many blobs, as shown in the following illustration. For the sake of simplicity, we assume that we can’t create a container inside a container.

  • PUT data: The blob store should allow the users to upload blobs to the created containers.
  • GET data: The system should generate a URL for the uploaded blob so that the user can access that blob later through this URL.
  • DELETE data: The users should be able to delete a blob. If the user wants to keep the data for a specified period (retention time) of time, it should support it.

  • LIST blobs: The user should be able to get a list of blobs inside a specific container.

  • DELETE a container: The users should be able to delete a container and all the blobs inside it.

  • LIST containers: The system should allow the users to list all the containers under a specific account.

Non-functional

Following are the non-functional requirements of our system.

  • Availability: Our system should be highly available.

  • Durability: The data, once uploaded, shouldn’t be lost unless users delete that data explicitly.

  • Scalability: The system should be capable of handling billions of blobs.

  • Throughput: Fortransferring gigabytes of data, we should ensure high data throughput.

  • Reliability:Since failures are a norm in distributed systems, our design should detect and recover from failures promptly.

  • Consistency: The system should be strongly consistent. That is, different users should see the same view of a blob.

Estimations

Let’s estimate the total number of servers, storage, and bandwidth required by blob storage. Because blobs can have all sorts of data, mentioning all those types of data in our estimation may not be practical. Therefore, we will take the example of YouTube, which stores videos and thumbnails on the blob store. Furthermore, we will make the following assumptions to complete our estimations.

Assumptions:

  • Daily active users who upload or watch videos: 5 Million
  • Number of requests per second that a single blob store server can handle: 500This number can be significantly higher, depending upon the blob store. For example, Microsoft Azure can handle a maximum of 20,000 IOPS.
  • Average size of a video: 50 MB
  • Average size of a thumbnail: 20 KB
  • Number of videos uploaded per day: 250000
  • Number of read requests by a single user per day: 20

Servers

From our assumptions, we will use the number of DAUs and queries a blob store server can handle per second. The number of servers required is calculated using the below formula:

Number of active  ...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy