Put Back-of-the-Envelope Numbers in Perspective

Learn to use appropriate numbers in back-of-the-envelope calculations.

Why back-of-the-envelope calculations?

A distributed system has compute nodes connected via a network. There is a wide variety of available compute nodes, and they can be connected in many different ways. Back-of-the-envelope calculations help us ignore nitty-gritty details of the system (at least at the design level) and focus on more important aspects.

Examples of the back-of-the-envelope calculation could be:

Number of concurrent TCP connections a server can support
Number of Requests Per Second (RPS) a web, database, or cache server can handle
Storage requirements of a service

Choosing an unreasonable number for such calculations can lead to a flawed design. Since we will need good estimations in many design problems, we will discuss related concepts in detail here including:

Types of data center servers
Realistic access latencies of different components
Estimation of RPS a server can handle
Examples of bandwidth, servers, and storage estimation

Types of data center servers

Data centers do not have a single type of server. Enterprise solutions use commodity hardware to save cost and develop scalable solutions. Below we discuss the types of servers that are commonly used within a data center to handle different workloads.

Web servers

For scalability, the web servers are decoupled from the application servers (which we will discuss next). Web servers are the first point of contact after load balancers. Data centers have racks full of web servers usually handling API calls from the clients. Depending on the service offered, the memory and storage resources in web servers can be small to medium. However, such servers require good computational resources. For example, Facebook has used a web server with 32 GB of RAM and 500 GB of storage space. But for its high-end computational needs, it partnered with Intel to build a custom 16 core processor.

Note: Many numbers quoted in this lesson are obtained from the data center design that Facebook open-sourced in 2011. Due to slowing Moore’s law-induced performance circa 2004, the numbers are not stale.

Application servers

These types of servers run the core application software and business logic. Even though the difference between web and application servers is somewhat fuzzy, application servers primarily provide dynamic content whereas web servers mostly serve static content to the client which is mostly a web browser. They can require extensive computational and (both volatile and non-volatile) storage resources. Facebook has used application servers with a RAM of up to 256 GB and two types of storage (traditional rotating disks and flash) with a capacity of up to 6.5 TB.

Storage servers

With the explosive growth of Internet users, the amount of data stored by giant services has multiplied. Also, various types of data are being stored in different storage units. For instance, YouTube uses the following datastores:

Blob storage for its encoded videos.
A temporary processing queue storage that can hold a few hundred hours of video content uploaded daily to YouTube for processing.
Specialized storage called Bigtable for storing a large number of thumbnails of videos.
Relational Database Management System (RDBMS) for users and videos metadata (comments, likes, user channels), etc.

Other data stores are still used for analytics purposes (for example, Hadoop’s HDFS). Storage servers mainly include structured (for example, SQL) and non-structure (NoSQL) data management systems.

Coming back to the example of Facebook, they have used servers with a storage capacity of up 120 TB. With the number of servers in use, Facebook is able to house exabytes of storage. (One exabyte is $10^{18}$ Bytes. By convention, we measure storage and network bandwidth in base 10, and not base 2.) However, the RAM of these machines is only 32 GB.

These are not the only types of servers in a data center. Organizations also require servers for services such as configuration, monitoring, load balancing, analytics, accounting, caching, etc.

The numbers open-sourced by Facebook are outdated as of now. Below we depict the capabilities of a server that can be used in today’s data centers:

The numbers above are inspired by the Amazon bare metal server but there can be more or less powerful machines supporting much higher RAM (up to 8 TB), disk storage (up to 24 disks with up to 20 TB each, circa 2021), and cache memory (up to 120 MB).

Standard numbers to remember

A lot of effort goes behind the planning and implementation of a service. But without any basic knowledge of the kind of workloads machines can handle, planning isn’t possible. Latencies play an important role in deciding the amount of workload a machine can handle. The table below depicts some of the important numbers system designers should know in order to perform resource estimation.

Important Latencies

Component	Time (nanoseconds)
L1 cache reference	0.9
L2 cache reference	2.8
L3 cache reference	12.9
Main memory reference	100
Compress 1KB with Snzip	3,000 (3 micro-seconds)
Read 1 MB sequentially from memory	9,000 (9 micro-seconds)
Read 1 MB sequentially from SSD	200,000 (200 micro-seconds)
Round trip within same datacenter	500,000 (500 micro-seconds)
Read 1 MB sequentially from SSD with speed ~1GB/sec SSD	1,000,000 (1 milli-seconds)
Disk seek	4,000,000 (4 milli-seconds)
Read 1 MB sequentially from disk	2,000,000 (2 milli-seconds)
Send packet SF->NYC	71,000,000 (71 milli-seconds)

The numbers above are approximations and vary greatly due to a number of reasons like the type of query ( pointpointQuery and rangerangeQuery ), the specification of the machine, the design of the database, the indexing, etc.

Requests estimation

This section will discuss how many requests a typical server can handle in a second. Within a server, there are limited resources, and depending on the type of client requests, different resources can become a bottleneck. Let’s understand two types of requests.

CPU bound requests: The type of requests where the limiting factor is the CPU.
Memory bound requests: These types of requests are limited by the amount of memory a machine has.

Let’s approximate the Requests Per Second (RPS) for each type of request. Before that, we need to assume the following:

Our server has the specifications of the typical server that we defined in the table above.
Operating systems and other auxiliary processes have consumed a total of 16 GB of RAM.
Each worker consumes 300 MBs of RAM storage to complete a request.
For simplicity, we assume that the CPU will obtain data from RAM. Therefore, a caching system ensures required content is available for serving, without going to the storage layer.
Each CPU-bound request takes 200 milli-seconds whereas a memory-bound request takes 50 milliseconds to complete.

Let’s do the computation for each type of request.

CPU bound: A simple formula used to calculate the RPS for CPU bound requests is:

$RPS_{CPU$ ...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy

Component	Count
Number of sockets	2
Processor	Intel Xeon X2686
Number of cores	36 cores (72 threads)
RAM	256 GB
Cache (L3)	45 MB
Storage capacity	15 TB


QPS handled by MySQL	1000
QPS handled by key-value store	10,000
QPS handled by cache server	100,000 - 1 M

Introduction

Abstractions

Non-functional System Characteristics

Back-of-the-Envelope Calculations

Building Blocks

Domain Name System (DNS)

Sequencer

Rate Limiter

Distributed Cache

Blob Store

Content Delivery Network (CDN)

Load Balancers

Key-Value Store

Distributed Messaging Queue

Pub-sub

Distributed Task Scheduler

Distributed Search

Distributed Logging

Distributed Monitoring

Monitoring Server Side Errors

Monitoring Client Side Errors

Databases

Sharded Counters

Concluding Building Blocks

Design YouTube

Design Quora

Design Google Maps

Designing a Proximity Server like Yelp

Design Uber

Design Twitter

Newsfeed System

Design Instagram

Design URL Shortening Service / TinyURL

Design a Web Crawler

Design WhatsApp

Design Typeahead Suggestion

Design Collaborative Document Editing Service / Google Docs

Spectacular Failures

Concluding Remarks

Appendix: System Design Interviews

All content below this will likely go away

Design Exercises

Archived temporary lessons

Design Resource Allocator for a Large Datacenter

Design Zoom

Continuous Monitoring using Data Processing

Design Live Commenting at Facebook

Security

For Noor: Placeholder for Illustration Making

Appendix

Backup of our Lessons

Caching Billions of Tiny Objects on Flash

Design Quora

Copy-Design YouTube

Identity & Access Management

Copy of CDN (02-03-2022)