Grokking the Modern System Design Interview/

...

Requirements of a Distributed Search System's Design

Let's identify the requirements of a distributed search system and outline the resources we need.

We'll cover the following...

Requirements
- Functional requirements
- Non-functional requirements
Resource estimation
Building blocks we will use

Press + to interact

Non-functional requirements

Here are the non-functional requirements of a distributed search system:

Availability: The system should be highly available to the users.
Scalability: The system should have the ability to scale with the increasing amount of data. In other words, it should be able to index a large amount of data.
Fast search on big data: The user should get the results quickly, no matter how much content they are searching.
Reduced cost: The overall cost of building a search system should be less.

Resource estimation

Let’s estimate the total number of servers, storage, and bandwidth that is required by the distributed search system. We’ll calculate these numbers using an example of a YouTube search.

Number of servers estimation

To estimate the number of servers, we need to know the number of daily active users of YouTube search feature. Let’s assume that we have 150 million daily active users of YouTube utilizing the search feature. Considering our assumption of using daily active users as a proxy for the number of requests per second to find the number of servers for peak load times, we get 150 million requests per second. Then, we use the following formula to calculate the number of servers:

Press + to interact

Each video’s metadata is stored in a separate JSON document. Each document is uniquely identified by the video ID. This metadata contains the title of the video, its description, the channel name, and a transcript. We assume the following numbers for estimating the storage required to index one video:

The size of a single JSON document is 200 KB.
The number of unique terms or keys extracted from a single JSON document is 1,000.
The amount of storage space required to add one term into the index table is 100 Bytes.

The following formula is used to compute the storage required to index one video:

Total_{storage/video} = Storage_{/ doc} + ( Terms_{/doc} \times Storage_{/ term})

Distributed Cache System

Pub-Sub

Blob Store

TikTok

Uber Eats

NewsFeed

Facebook Messenger

ChatGPT

Requirements of a Distributed Search System's Design

Requirements

Functional requirements

Non-functional requirements

Resource estimation

Number of servers estimation

Storage estimation