...

/

Put Back-of-the-Envelope Numbers in Perspective

Put Back-of-the-Envelope Numbers in Perspective

Learn to use appropriate numbers in back-of-the-envelope calculations.

Why back-of-the-envelope calculations?

A distributed system has compute nodes connected via a network. There is a wide variety of available compute nodes, and they can be connected in many different ways. Back-of-the-envelope calculations help us ignore nitty-gritty details of the system (at least at the design level) and focus on more important aspects.

Examples of the back-of-the-envelope calculation could be:

  • Number of concurrent TCP connections a server can support
  • Number of Requests Per Second (RPS) a web, database, or cache server can handle
  • Storage requirements of a service

Choosing an unreasonable number for such calculations can lead to a flawed design. Since we will need good estimations in many design problems, we will discuss related concepts in detail here including:

  • Types of data center servers
  • Realistic access latencies of different components
  • Estimation of RPS a server can handle
  • Examples of bandwidth, servers, and storage estimation

Types of data center servers

Data centers do not have a single type of server. Enterprise solutions use commodity hardware to save cost and develop scalable solutions. Below we discuss the types of servers that are commonly used within a data center to handle different workloads.

Web servers

For scalability, the web servers are decoupled from the application servers (which we will discuss next). Web servers are the first point of contact after load balancers. Data centers have racks full of web servers usually handling API calls from the clients. Depending on the service offered, the memory and storage resources in web servers can be small to medium. However, such servers require good computational resources. For example, Facebook has used a web server with 32 GB of RAM and 500 GB of storage space. But for its high-end computational needs, it partnered with Intel to build a custom 16 core processor.

Note: Many numbers quoted in this lesson are obtained from the data center design that Facebook open-sourced in 2011. Due to slowing Moore’s law-induced performance circa 2004, the numbers are not stale.

Application servers

These types of servers run the core application software and business logic. Even though the difference between web and application servers is somewhat fuzzy, application servers primarily provide dynamic content whereas web servers mostly serve static content to the client which is mostly a web browser. They can require extensive computational and (both volatile and non-volatile) storage resources. Facebook has used application servers with a RAM of up to 256 GB and two types of storage (traditional rotating disks and flash) with a capacity of up to 6.5 TB.

Storage servers

With the explosive growth of Internet users, the amount of data stored by giant services has multiplied. Also, various types of data are being stored in different storage units. For instance, YouTube uses the following datastores:

  1. Blob storage for its encoded videos.
  2. A temporary processing queue storage that can hold a few hundred hours of video content uploaded daily to YouTube for processing.
  3. Specialized storage called Bigtable for storing a large number of thumbnails of videos.
  4. Relational Database Management System (RDBMS) for users and videos metadata (comments, likes, user channels), etc.

Other data stores are still used for analytics purposes (for example, Hadoop’s HDFS). Storage servers mainly include structured (for example, SQL) and non-structure (NoSQL) data management systems.

Coming back to the example of Facebook, they have used servers with a storage capacity of up 120 TB. With the number of servers in use, Facebook is able to house exabytes of storage. (One exabyte is 101810^{18} Bytes. By convention, we measure storage and network bandwidth in base 10, and not base 2.) However, the RAM of these machines is only 32 GB.

These are not the only types of servers in a data center. Organizations also require servers for services such as configuration, monitoring, load balancing, analytics, accounting, caching, etc.

The numbers open-sourced by Facebook are outdated as of now. Below we depict the capabilities of a server that can be used in today’s data centers:

Typical Server Specifications

Component

Count

Number of sockets

2

Processor

Intel Xeon X2686

Number of cores

36 cores (72 threads)

RAM

256 GB

Cache (L3)

45 MB

Storage capacity

15 TB

The numbers above are inspired by the Amazon bare metal server but there can be more or less powerful machines supporting much higher RAM (up to 8 TB), disk storage (up to 24 disks with up to 20 TB each, circa 2021), and cache memory (up to 120 MB).

Standard numbers to remember

A lot of effort goes behind the planning and implementation of a service. But without any basic knowledge of the kind of workloads machines can handle, planning isn’t possible. Latencies play an important role in deciding the amount of workload a machine can handle. The table below depicts some of the important numbers system designers should know in order to perform resource estimation.

Important Latencies

Component

Time (nanoseconds)

L1 cache reference

0.9

L2 cache reference

2.8

L3 cache reference

12.9

Main memory reference

100

Compress 1KB with Snzip

3,000 (3 micro-seconds)

Read 1 MB sequentially from memory

9,000 (9 micro-seconds)

Read 1 MB sequentially from SSD

200,000 (200 micro-seconds)

Round trip within same datacenter

500,000 (500 micro-seconds)

Read 1 MB sequentially from SSD with speed ~1GB/sec SSD

1,000,000 (1 milli-seconds)

Disk seek

4,000,000 (4 milli-seconds)

Read 1 MB sequentially from disk

2,000,000 (2 milli-seconds)

Send packet SF->NYC

71,000,000 (71 milli-seconds)

Apart from the latencies above, there are also throughput numbers measured as Queries Per Second (QPS) that a typical single-server datastore can handle.

Important Rates



QPS handled by MySQL

1000

QPS handled by key-value store

10,000

QPS handled by cache server

100,000 - 1 M

The numbers above are approximations and vary greatly due to a number of reasons like the type of query ( pointpointQuery and rangerangeQuery ), the specification of the machine, the design of the database, the indexing, etc.

Requests estimation

This section will discuss how many requests a typical server can handle in a second. Within a server, there are limited resources, and depending on the type of client requests, different resources can become a bottleneck. Let’s understand two types of requests.

  • CPU bound requests: The type of requests where the limiting factor is the CPU.
  • Memory bound requests: These types of requests are limited by the amount of memory a machine has.

Let’s approximate the Requests Per Second (RPS) for each type of request. Before that, we need to assume the following:

  • Our server has the specifications of the typical server that we defined in the table above.
  • Operating systems and other auxiliary processes have consumed a total of 16 GB of RAM.
  • Each worker consumes 300 MBs of RAM storage to complete a request.
  • For simplicity, we assume that the CPU will obtain data from RAM. Therefore, a caching system ensures required content is available for serving, without going to the storage layer.
  • Each CPU-bound request takes 200 milli-seconds whereas a memory-bound request takes 50 milliseconds to complete.

Let’s do the computation for each type of request.

CPU bound: A simple formula used to calculate the RPS for CPU bound requests is:

RPSCPURPS_{CPU ...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy