...

Copy of Put Back-of-the-Envelope Numbers in Perspective

Using realistic numbers in back-of-the-envelope calculations.

A distributed system has compute nodes, connected via a network. There is a wide variety of available compute nodes and they can be connected in many different ways. Back-of-the-envelope calculations help us to ignore nitty-gritty details of the system (at least at the design level) and to focus on more important aspects.

An example of the back-of-the-envelope calculation could be:

Number of concurrent TCP connections a server can support
Latency of a cross-continental link connecting two data centers

Choosing an unreasonable number for such calculations can irk the interviewer and send a wrong message that we are not aware of any real system.

Overview of the system

Let’s consider an example of an online classified platform, similar to OLX which is the world’s leading classifieds platform in growth markets and is available in more than 40 countries and over 50 languages.

Our platform will allow people to buy, sell or exchange a wide variety of used goods and services by enabling them to post a listing through their mobile phone or on the web in a fast and convenient manner.

Server requirements

Before we dig deep into the server requirements and assess different design options, let’s first look at some numbers to get an idea of the time taken by some of the routine computer operations.

The following tables shows the typical latency, throughput numbers, and server capabilities to guide our back-of-the-envelope calculations.

Note: Some of these numbers may be a little outdated considering all the advancements in modern day computers. However, they are enough to give us some idea about the time taken by some of the routine computer operations.

Latency Numbers

Component	Time (nano seconds)	Comment
L1 cache reference	0.5
Branch mispredict	5
L2 cache reference	7	14x L1 cache
Mutex lock/unlock	25
Main memory reference	100	20x L2 cache, 200x L1 cache
Compress 1K bytes with Snzip	3,000 (3 micro-seconds)
Send 1K bytes over 1 Gbps network	10,000 (10 micro-seconds)
Read 4K randomly from SSD with speed ~1GB/sec SSD	150,000 (150 micro-seconds)
Read 1 MB sequentially from memory	250,000 (250 micro-seconds)
Round trip within same datacenter	500,000 (500 micro-seconds)
Read 1 MB sequentially from SSD with speed ~1GB/sec SSD	1,000,000 (1 milli-seconds)	4X memory
Disk seek	10,000,000 (10 milli-seconds)	20x datacenter roundtrip
Read 1 MB sequentially from disk	20,000,000 (20 milli-seconds)	80x memory, 20X SSD
Send packet CA->Netherlands->CA	150,000,000 (150 milli-second)

Throughput Numbers

	Throughput (Mbps)	Comment
Across virtual machines on the same physical server	0.5
Out of local server	5
Between two system on the same rack	7
Between two systems on different racks	25
Typical cross-sectional bandwidth	10,000 (10Gbps) 40 ro 100 Gbps becoming common Google's Jupiter fabrics based network can support 1 Petabit/sec bisectional bandwidth Near-by data centers usually have high bandwidth as compared to long-haul
Typical aggregate bandwidth between two datacenters	10,000 (10 Gbps) and onwards
SSD Read/Write
Disk Read/Write

No. of servers required

Let’s assume we have a total of 350M active users and a single user makes 20 requests per minute on average.

This will give us a total of 350M * 20 = 7B requests per day or approximately 81K requests per second which is out queries per second (QPS) for our system.
Assuming an average request takes 50 milliseconds of compute time on a single core, each server will be able to process 20 requests per second. Since our server has 160 cores, therefore it will be able to process 320 requests per second.
Putting these numbers together, we will need approximately 81K / 320 ~= 260 servers to handle all requests.

Cache

Caching refers to storing of data to serve future requests for that data faster. That cached data may be a copy of data stored elsewhere or a result of an earlier operation.

For our system, we can use a number of ...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy

Component	Count	Comment
Number of processors	2
Number of cores	160
RAM	Up to 8TB
Number of disks	Up to 24
Network interfaces	LOM: OCP3.0 NIC PCIe x16 Gen4 Management: 1x 1GbE from BMC to Rear RJ45 Connector	Lights-out management (LOM): It is a form of out-of-band management that enables system administrators to monitor and maintain by connecting remotely to servers and other data center infrastructure.

Introduction

Abstractions

Non-functional System Characteristics

Back-of-the-Envelope Calculations

Building Blocks

Domain Name System (DNS)

Sequencer

Rate Limiter

Distributed Cache

Blob Store

Content Delivery Network (CDN)

Load Balancers

Key-Value Store

Distributed Messaging Queue

Pub-sub

Distributed Task Scheduler

Distributed Search

Distributed Logging

Distributed Monitoring

Monitoring Server Side Errors

Monitoring Client Side Errors

Databases

Sharded Counters

Concluding Building Blocks

Design YouTube

Design Quora

Design Google Maps

Designing a Proximity Server like Yelp

Design Uber

Design Twitter

Newsfeed System

Design Instagram

Design URL Shortening Service / TinyURL

Design a Web Crawler

Design WhatsApp

Design Typeahead Suggestion

Design Collaborative Document Editing Service / Google Docs

Spectacular Failures

Concluding Remarks

Appendix: System Design Interviews

All content below this will likely go away

Design Exercises

Archived temporary lessons

Design Resource Allocator for a Large Datacenter

Design Zoom

Continuous Monitoring using Data Processing

Design Live Commenting at Facebook

Security

For Noor: Placeholder for Illustration Making

Appendix

Backup of our Lessons

Caching Billions of Tiny Objects on Flash

Design Quora

Copy-Design YouTube

Identity & Access Management

Copy of CDN (02-03-2022)