When I was conducting System Design Interviews at Microsoft and Facebook (now Meta), I saw a lot of candidates struggle with back-of-the-envelope calculations.
This is not only because back-of-the-envelope estimation can be a difficult process to master (especially in a pressure-filled interview setting), but also because most candidates underestimate its impact on the final design.
Today, I will walk you through the essentials of back-of-the-envelope calculations and their role in System Design Interviews. I will discuss the best ways to approach these questions, and share a few tricks I have developed over the years.
Back-of-the-envelope calculations (BOTEC) refer to rough, simplified, and quick mathematical or numerical estimations to obtain a reasonable answer to a problem or question. These calculations typically use readily available information, basic mathematical principles, and assumptions rather than precise data and complex computations.
Tip: The term back-of-the-envelope originates from the idea that these calculations can be done on the back of an envelope or a scrap of paper, emphasizing their simplicity and informal nature.
Here is an interesting question where we can apply our BOTEC concepts:
Food for thought!
How many liters of water flow out of the Mississippi River in a day?
Consider a strategy for resolving such a problem for back-of-the-envelope style calculations and covering your unknowns with safety factors.
In the System Design domain, back-of-the-envelope estimation is a rough calculation of the number of computational resources required to design a system. As designers, we want to have confidence that our design can meet the functional and nonfunctional requirements with reasonable resource requirements—often referred to as the feasibility of the design. Assuming our initial estimation for a resource is higher than expected, we either need to re-engineer the design or tweak it to meet the necessary constraints.
For System Design Interviews, some examples are listed below:
Find the number of concurrent websocket connections a server maintains in a real-time chatting application.
Calculate the storage requirement for Instagram for one day, etc.
You can check out the following course on Educative to learn and apply the use of BOTEC:
Grokking the Modern System Design Interview
System Design interviews are now part of every Engineering and Product Management Interview. Interviewers want candidates to exhibit their technical knowledge of core building blocks and the rationale of their design approach. This course presents carefully selected system design problems with detailed solutions that will enable you to handle complex scalability scenarios during an interview or designing new products. You will start with learning a bottom-up approach to designing scalable systems. First, you’ll learn about the building blocks of modern systems, with each component being a completely scalable application in itself. You'll then explore the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process. Finally, you'll design several popular services by using these modular building blocks in unique combinations, and learn how to evaluate your design.
Back-of-the-envelope calculations are critical in System Design Interviews, especially in the early stages. This is a period where the candidates make intelligent approximations about the different types of resources required to design systems. Typically, resources can include the number of servers, bandwidth, system memory, compute machines, cost, etc.
BOTEC estimations are tricky but essential to the System Design Interview because they enable the interviewee to make effective design decisions, identify bottlenecks, and meet different SLAs. Since these calculations are performed in an early stage of the interview, they represent an important avenue to impress your interviewer in the following ways:
You get to showcase your analytical and problem-solving skills by breaking down the complexity of the problem into smaller manageable pieces.
You can depict your expertise by arriving at rough yet practical estimations.
Even if rough, your estimations are based on meaningful assumptions you communicate beforehand.
Remember: BOTEC aims to create an effective System Design, not an accurate estimate of the required resources!
In System Design Interviews, interviewers can artificially reduce some resources. This might require us to tweak the design to meet the new constraint. For example, suppose we want to reduce the use of storage. In that case, we should think:
Are we compressing data?
Can we reduce the number of replicas?
Should we move to Reed-Solomon encoding-based replication instead of plain data replication?
Can we archive old data using high-compression algorithms (that come at the cost of higher computing use for compression and decompression) or
Should we have a data retention policy that allows us to delete data older than a few years?
Adding a new restriction to a resource opens up discussion of alternate solutions—a hallmark of System Design’s richness.
Point to ponder!
Why do System Design Interviews prioritize back-of-the-envelope approximations over detailed resource estimations?
While BOTEC estimation is tricky, I will give you an approach to solving different estimation problems. I will also share some tips that have made estimation exercises a breeze over the years. Let’s get started!
This is one of the most asked types of estimation questions.
How do you estimate storage for xyz problem?
Here is how you should start approaching the solution:
First, consider the data that you want to include in the estimation. For example, for xyz problem, how much storage space is needed for the abc data? Remember, it’s good to communicate your thought process to the interviewer if you are not considering data.
Put your consideration into numbers. Assume the size of the abc data from the previous step.
This is where you will perform the actual estimation. You have the data and its size but need a multiplicative factor. Say, abc data uploaded by n number of users in a day/month/year. The estimation will be
There can always be special cases in storage estimation. Take the example of social media. Some users upload videos as part of their posts while others upload pictures and others do none. You have to handle each case separately when approximating the final storage space in this step.
It’s always important to discuss any missing cases with your interviewer. The interviewer may consider some edge cases important in storage. This also allows you an opportunity to revise your work.
Let's talk about bandwidth estimation now. There are two key steps here:
Estimation of the bandwidth required for outgoing and incoming traffic. You can derive bandwidth requirements from the storage space requirement. Intuitively, any data that will be a part of a post on social media will have to be uploaded before it’s stored on the database.
Calculate the outgoing and incoming traffic sum to determine the total bandwidth requirement.
I will share real examples of storage and bandwidth estimation in the coming section, but another commonly asked question is the number of servers required to provide a service. This requires us to derive and use the number of requests a server can handle, typically known as requests per server (RPS). You can read here about how to derive and estimate the number of servers.
Before we do any calculations, here are some tips that you will find effective during the estimation phase in an interview:
Always communicate your thoughts, assumptions, and considerations during BOTEC estimations. Write them down so you can quickly refer to them during the interview.
You don’t need to be 100% accurate with any assumptions you make as long as the assumptions are meaningful and lie within an acceptable range.
During an interview, you can’t juggle quickly with numbers in your mind unless you round off numbers. Take the opportunity to round off numbers for your convenience.
Either estimate for peak loads or average load, whatever you agree with your interviewer. Doing both will be time-consuming, but doing one while effectively communicating about the other will send hireable signals to the interviewer about you.
Adapt your RPS based on the application and type of requests. A server receiving many IO-bound requests will have a lower RPS than a system with many memory-bound requests.
Practice, practice, practice! Do BOTEC estimation for various resources for numerous design problems. There is no alternative to the experience you gain from doing.
Let’s walk the talk now. Here are a couple of working examples based on the discussion above.
A lot of storage is required on social media platforms. Let’s take Instagram as an example.
Assume the following:
The total number of users uploading a post per day is 1 million.
The size of an image uploaded to Instagram, on average, is 1 MB.
The size of a video uploaded on average is 20 MB.
The textual content size per post is 2 KB.
Considering that half of the daily uploaded posts have an image while the other half has a video attached to it, as shown below:
The above calculations are the storage requirements for a single day only. But for a content-heavy service like Instagram, this storage is negligible. Keep in mind that we haven't yet considered any user or application data.
Let’s use the example of Instagram for estimating bandwidth requirements as well. Bandwidth estimation will be done in two steps:
Estimate daily incoming and outgoing traffic to and from the Instagram service.
Calculate the bandwidth requirement by dividing incoming and going traffic by the number of daily seconds.
I will discuss incoming and outgoing traffic separately for convenience. Since its BOTEC estimation, I will consider the storage required as the incoming traffic. i.e.
For outgoing traffic, you again need to make an assumption, which is the read-to-write ratio. Assuming it is 10:1, the number of users viewing the posts is ten times compared to the ones uploading them. we will need
The above are two of many types of BOTEC estimations during system design interviews. I have assumed several numbers (size of resources, read-to-write ratio, etc.) to complete only one bandwidth estimation. You are good if your assumptions are intelligent and don’t raise concerns!
As I said, some of our calculations will be based on meaningful assumptions, but you must remember some essential numbers as a developer. Planning, prototyping, and developing services require a lot of effort, but these efforts will be inadequate without basic knowledge of how machines handle different workloads. For example, the amount of storage space a video takes on a YouTube server can be an assumption, but the amount of data that can be read from a random access memory (RAM) in a second will have to be remembered since that is usually within a specified range.
As system designers, remember these important numbers during resource estimations:
Component | Time (Nanoseconds) |
L1 cache reference | 0.9 |
L2 cache reference | 2.8 |
L3 cache reference | 12.9 |
Main memory reference | 100 |
Compress 1 KB with Snzip | 3,000 (3 microseconds) |
Read 1 MB sequentially from memory | 9,000 (9 microseconds) |
Read 1 MB sequentially from SSD | 200,000 (200 microseconds) |
Round trip within the same data center | 500,000 (500 microseconds) |
Read 1 MB sequentially from the SSD with speed ~1 GB/sec SSD | 1,000,000 (1 milliseconds) |
Disk seek | 4,000,000 (4 milliseconds) |
Read 1 MB sequentially from disk | 2,000,000 (2 milliseconds) |
Send packet SF->NYC | 71,000,000 (71 milliseconds) |
Another important design goal is high availability, which refers to the application’s uptime even in failure. As an engineer, you must be aware of the five 9s of availability given in the table below:
Availability | Percentage | Downtime per Year | Loss in USD (assuming 1 USD/sec) |
1 nine | 90% | 36.5 days | $3,153,600 |
2 nines | 99.0% | 3.65 days | $315,360 |
3 nines | 99.9% | 8.76 hours | $31,536 |
4 nines | 99.99% | 52.56 minutes | $3153.60 |
5 nines | 99.999% | 5.26 minutes | $315.36 |
Intuitively, you want as many 9s as possible. However, the cost increases exponentially as you try to increase the number of 9s of availability. Take, for example, a service that aims to achieve 5 nines of availability. Assuming the loss per second is $1, the total cost can be calculated as:
Total cost (in USD) =
Comparing that to the cost of staying at 4 nines:
Total cost (in USD) =
If cost is a concern, it makes sense to tolerate a
Note: The above are back-of-the-envelope cost approximations for achieving five nines of availability. In reality, achieving five nines is an extremely difficult process.
Other important numbers include unit conversions. For example, giga is represented by the symbol G, equivalent to an order of magnitude of
You have now learned the essentials of back-of-the-envelope estimations for System Design Interviews. However, there are still several improvements you need to make to ace this phase of the interview, such as:
How do you justify the number of requests a server can handle per second? Note that different types of servers will have different requests per second (RPS) rates.
What’s a typical server specification for executing the assumed RPS for BOTEC calculations?
How do you deal with different types of queries? For example, you cannot handle memory and IO-bound queries similarly.
How do changing constraints allow you to envision the tradeoffs involved in designing scalable systems?
These and other interesting aspects are covered in detail in our courses below, which prepare you for System Design Interview questions and teaches you the art of designing for interviews and beyond!
Free Resources