Home/Blog/System Design/Back-of-the-envelope estimations in System Design Interviews
Home/Blog/System Design/Back-of-the-envelope estimations in System Design Interviews

Back-of-the-envelope estimations in System Design Interviews

Fahim ul Haq
Jul 05, 2024
13 min read

When I was conducting System Design Interviews at Microsoft and Facebook (now Meta), I saw a lot of candidates struggle with back-of-the-envelope calculations.

This is not only because back-of-the-envelope estimation can be a difficult process to master (especially in a pressure-filled interview setting), but also because most candidates underestimate its impact on the final design.

Today, I will walk you through the essentials of back-of-the-envelope calculations and their role in System Design Interviews. I will discuss the best ways to approach these questions, and share a few tricks I have developed over the years.

What are back-of-the-envelope calculations?#

Back-of-the-envelope calculations (BOTEC) refer to rough, simplified, and quick mathematical or numerical estimations to obtain a reasonable answer to a problem or question. These calculations typically use readily available information, basic mathematical principles, and assumptions rather than precise data and complex computations.

Tip: The term back-of-the-envelope originates from the idea that these calculations can be done on the back of an envelope or a scrap of paper, emphasizing their simplicity and informal nature.

Here is an interesting question where we can apply our BOTEC concepts:

Food for thought!

Question

How many liters of water flow out of the Mississippi River in a day?

Consider a strategy for resolving such a problem for back-of-the-envelope style calculations and covering your unknowns with safety factors.

Show Answer

In the System Design domain, back-of-the-envelope estimation is a rough calculation of the number of computational resources required to design a system. As designers, we want to have confidence that our design can meet the functional and nonfunctional requirements with reasonable resource requirements—often referred to as the feasibility of the design. Assuming our initial estimation for a resource is higher than expected, we either need to re-engineer the design or tweak it to meet the necessary constraints.

For System Design Interviews, some examples are listed below:

  • Find the number of concurrent websocket connections a server maintains in a real-time chatting application.

  • Calculate the storage requirement for Instagram for one day, etc.

You can check out the following course on Educative to learn and apply the use of BOTEC:

Grokking the Modern System Design Interview

Cover
Grokking the Modern System Design Interview

System Design interviews are now part of every Engineering and Product Management Interview. Interviewers want candidates to exhibit their technical knowledge of core building blocks and the rationale of their design approach. This course presents carefully selected system design problems with detailed solutions that will enable you to handle complex scalability scenarios during an interview or designing new products. You will start with learning a bottom-up approach to designing scalable systems. First, you’ll learn about the building blocks of modern systems, with each component being a completely scalable application in itself. You'll then explore the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process. Finally, you'll design several popular services by using these modular building blocks in unique combinations, and learn how to evaluate your design.

26hrs
Intermediate
5 Playgrounds
18 Quizzes

Role of BOTEC in System Design Interviews#

Back-of-the-envelope calculations are critical in System Design Interviews, especially in the early stages. This is a period where the candidates make intelligent approximations about the different types of resources required to design systems. Typically, resources can include the number of servers, bandwidth, system memory, compute machines, cost, etc.

BOTEC estimations are tricky but essential to the System Design Interview because they enable the interviewee to make effective design decisions, identify bottlenecks, and meet different SLAs. Since these calculations are performed in an early stage of the interview, they represent an important avenue to impress your interviewer in the following ways:

  • You get to showcase your analytical and problem-solving skills by breaking down the complexity of the problem into smaller manageable pieces.

  • You can depict your expertise by arriving at rough yet practical estimations.

  • Even if rough, your estimations are based on meaningful assumptions you communicate beforehand.

Remember: BOTEC aims to create an effective System Design, not an accurate estimate of the required resources!

In System Design Interviews, interviewers can artificially reduce some resources. This might require us to tweak the design to meet the new constraint. For example, suppose we want to reduce the use of storage. In that case, we should think:

  • Are we compressing data?

  • Can we reduce the number of replicas?

  • Should we move to Reed-Solomon encoding-based replication instead of plain data replication?

  • Can we archive old data using high-compression algorithms (that come at the cost of higher computing use for compression and decompression) or

  • Should we have a data retention policy that allows us to delete data older than a few years?

Adding a new restriction to a resource opens up discussion of alternate solutions—a hallmark of System Design’s richness.

Point to ponder!

Question

Why do System Design Interviews prioritize back-of-the-envelope approximations over detailed resource estimations?

Show Answer

How to approach BOTEC estimation#

While BOTEC estimation is tricky, I will give you an approach to solving different estimation problems. I will also share some tips that have made estimation exercises a breeze over the years. Let’s get started!

Storage estimation#

This is one of the most asked types of estimation questions.

How do you estimate storage for xyz problem?

Here is how you should start approaching the solution:

  1. First, consider the data that you want to include in the estimation. For example, for xyz problem, how much storage space is needed for the abc data? Remember, it’s good to communicate your thought process to the interviewer if you are not considering data.

  2. Put your consideration into numbers. Assume the size of the abc data from the previous step.

  3. This is where you will perform the actual estimation. You have the data and its size but need a multiplicative factor. Say, abc data uploaded by n number of users in a day/month/year. The estimation will be abc×nabc \times n is the storage space required per day/month/year.

  4. There can always be special cases in storage estimation. Take the example of social media. Some users upload videos as part of their posts while others upload pictures and others do none. You have to handle each case separately when approximating the final storage space in this step.

  5. It’s always important to discuss any missing cases with your interviewer. The interviewer may consider some edge cases important in storage. This also allows you an opportunity to revise your work.

Bandwidth estimation#

Let's talk about bandwidth estimation now. There are two key steps here:

  1. Estimation of the bandwidth required for outgoing and incoming traffic. You can derive bandwidth requirements from the storage space requirement. Intuitively, any data that will be a part of a post on social media will have to be uploaded before it’s stored on the database.

  2. Calculate the outgoing and incoming traffic sum to determine the total bandwidth requirement.

I will share real examples of storage and bandwidth estimation in the coming section, but another commonly asked question is the number of servers required to provide a service. This requires us to derive and use the number of requests a server can handle, typically known as requests per server (RPS). You can read here about how to derive and estimate the number of servers.

Tips and tricks on BOTEC estimation#

Before we do any calculations, here are some tips that you will find effective during the estimation phase in an interview:

  • Always communicate your thoughts, assumptions, and considerations during BOTEC estimations. Write them down so you can quickly refer to them during the interview.

  • You don’t need to be 100% accurate with any assumptions you make as long as the assumptions are meaningful and lie within an acceptable range.

  • During an interview, you can’t juggle quickly with numbers in your mind unless you round off numbers. Take the opportunity to round off numbers for your convenience.

  • Either estimate for peak loads or average load, whatever you agree with your interviewer. Doing both will be time-consuming, but doing one while effectively communicating about the other will send hireable signals to the interviewer about you.

  • Adapt your RPS based on the application and type of requests. A server receiving many IO-bound requests will have a lower RPS than a system with many memory-bound requests.

  • Practice, practice, practice! Do BOTEC estimation for various resources for numerous design problems. There is no alternative to the experience you gain from doing.

Examples of BOTEC estimations#

Let’s walk the talk now. Here are a couple of working examples based on the discussion above.

Storage estimation#

A lot of storage is required on social media platforms. Let’s take Instagram as an example.

Assume the following:

  • The total number of users uploading a post per day is 1 million.

  • The size of an image uploaded to Instagram, on average, is 1 MB.

  • The size of a video uploaded on average is 20 MB.

  • The textual content size per post is 2 KB.

Considering that half of the daily uploaded posts have an image while the other half has a video attached to it, as shown below:

Estimated storage space required for Instagram in a day
Estimated storage space required for Instagram in a day

The above calculations are the storage requirements for a single day only. But for a content-heavy service like Instagram, this storage is negligible. Keep in mind that we haven't yet considered any user or application data.

Bandwidth calculations#

Let’s use the example of Instagram for estimating bandwidth requirements as well. Bandwidth estimation will be done in two steps:

  1. Estimate daily incoming and outgoing traffic to and from the Instagram service.

  2. Calculate the bandwidth requirement by dividing incoming and going traffic by the number of daily seconds.

I will discuss incoming and outgoing traffic separately for convenience. Since its BOTEC estimation, I will consider the storage required as the incoming traffic. i.e. 10.5 TBs10.5\ TBs and divide it by 86,40086,400—the number of seconds in a day.

For outgoing traffic, you again need to make an assumption, which is the read-to-write ratio. Assuming it is 10:1, the number of users viewing the posts is ten times compared to the ones uploading them. we will need 9.72 Gbps9.72\ Gbpsof bandwidth. The total bandwidth required is approximated to:

The total amount of bandwidth supposedly required for Instagram
The total amount of bandwidth supposedly required for Instagram

The above are two of many types of BOTEC estimations during system design interviews. I have assumed several numbers (size of resources, read-to-write ratio, etc.) to complete only one bandwidth estimation. You are good if your assumptions are intelligent and don’t raise concerns!

Standard numbers every developer must know#

As I said, some of our calculations will be based on meaningful assumptions, but you must remember some essential numbers as a developer. Planning, prototyping, and developing services require a lot of effort, but these efforts will be inadequate without basic knowledge of how machines handle different workloads. For example, the amount of storage space a video takes on a YouTube server can be an assumption, but the amount of data that can be read from a random access memory (RAM) in a second will have to be remembered since that is usually within a specified range.

As system designers, remember these important numbers during resource estimations:

Important Latencies

Component

Time (Nanoseconds)

L1 cache reference

0.9

L2 cache reference

2.8

L3 cache reference

12.9

Main memory reference

100

Compress 1 KB with Snzip

3,000 (3 microseconds)

Read 1 MB sequentially from memory

9,000 (9 microseconds)

Read 1 MB sequentially from SSD

200,000 (200 microseconds)

Round trip within the same data center

500,000 (500 microseconds)

Read 1 MB sequentially from the SSD with speed ~1 GB/sec SSD

1,000,000 (1 milliseconds)

Disk seek

4,000,000 (4 milliseconds)

Read 1 MB sequentially from disk

2,000,000 (2 milliseconds)

Send packet SF->NYC

71,000,000 (71 milliseconds)

Another important design goal is high availability, which refers to the application’s uptime even in failure. As an engineer, you must be aware of the five 9s of availability given in the table below:

Five 9s of Availability

Availability

Percentage

Downtime per Year

Loss in USD (assuming 1 USD/sec)

1 nine

90%

36.5 days

$3,153,600

2 nines

99.0%

3.65 days

$315,360

3 nines

99.9%

8.76 hours

$31,536

4 nines

99.99%

52.56 minutes

$3153.60

5 nines

99.999%

5.26 minutes

$315.36

Intuitively, you want as many 9s as possible. However, the cost increases exponentially as you try to increase the number of 9s of availability. Take, for example, a service that aims to achieve 5 nines of availability. Assuming the loss per second is $1, the total cost can be calculated as:

  • 315.35315.35 plus the cost for achieving 5 nines (say $10,00010,000)

  • Total cost (in USD) = 315.35+10,000=10,315.35315.35 + 10,000 = 10,315.35

Comparing that to the cost of staying at 4 nines:

  • 3153.603153.60 + no additional cost

  • Total cost (in USD) = 3153.60+0=3153.603153.60 + 0 = 3153.60

If cost is a concern, it makes sense to tolerate a 47.3(52.565.26)47.3 (52.56 – 5.26) minutes of downtime at the benefit of $7161.757161.75. Depending on your application and cost constraints, you must find an optimal number of 9s.

Note: The above are back-of-the-envelope cost approximations for achieving five nines of availability. In reality, achieving five nines is an extremely difficult process.

Other important numbers include unit conversions. For example, giga is represented by the symbol G, equivalent to an order of magnitude of 10910^9.

What's next#

You have now learned the essentials of back-of-the-envelope estimations for System Design Interviews. However, there are still several improvements you need to make to ace this phase of the interview, such as:

  • How do you justify the number of requests a server can handle per second? Note that different types of servers will have different requests per second (RPS) rates.

  • What’s a typical server specification for executing the assumed RPS for BOTEC calculations?

  • How do you deal with different types of queries? For example, you cannot handle memory and IO-bound queries similarly.

  • How do changing constraints allow you to envision the tradeoffs involved in designing scalable systems?

These and other interesting aspects are covered in detail in our courses below, which prepare you for System Design Interview questions and teaches you the art of designing for interviews and beyond!


  

Free Resources