Home/Blog/System Design/Guide to nonfunctional requirements for System Design Interviews
Home/Blog/System Design/Guide to nonfunctional requirements for System Design Interviews

Guide to nonfunctional requirements for System Design Interviews

Fahim ul Haq
14 min read

As a professional software developer focused on building distributed systems for much of my professional career, I have always been fascinated by the unique challenges of designing enterprise-scale applications. This passion led me to fulfill my dream of working in Big Tech, including helping to build Microsoft Azure and working on distributed storage at Facebook. During my tenure there, I led hundreds of candidates through System Design Interviews.

When I moved on from FAANG to launch Educative, I wanted to make System Design skills training a focus for two reasons:

  1. I wanted to help give engineers a leg up in their career journey

  2. I wanted to stay connected to the discipline of System Design

To continue sharing System Design best practices and supporting the developer community, I contributed to the design and development of Educative’s now very popular flagship System Design course.

From interviewing candidates at Meta and Microsoft to building tech skills courses for software developers, I have seen one constant: even the best engineers often struggle with understanding and designing systems to meet nonfunctional requirements (NFR). I get it—it can be difficult managing critical trade-offs around nonfunctional requirements like scalability, availability, good performance, security, and more—especially in a stressful interview setting.

Nonfunctional requirements discussed in this article
Nonfunctional requirements discussed in this article

Take this key System Design Interview question for example:

  • How can you design a scalable and performant e-commerce website that can handle millions of requests per second?

While the engineer will be able to design the system to meet all functional requirements, making the design scalable and still achieving low latency on requests will remain a challenge.

Today, through this blog, I’ll share a few essential strategies for how to meet nonfunctional requirements in your designs. These strategies will prepare you to confidently navigate System Design interviews at top tech companies.

Let's get started!

Please note that identifying and achieving an NFR are two different things. In this blog, our focus will be on achieving, not identifying nonfunctional requirements. To learn how to identify NFRs and distinguish them from the functional requirements for various System Designs, I recommend exploring our comprehensive Grokking Modern System Design Interview course, where we discuss NFRs of the various design problems in detail.

Cover
Grokking the Modern System Design Interview

System Design interviews are now part of every Engineering and Product Management Interview. Interviewers want candidates to exhibit their technical knowledge of core building blocks and the rationale of their design approach. This course presents carefully selected system design problems with detailed solutions that will enable you to handle complex scalability scenarios during an interview or designing new products. You will start with learning a bottom-up approach to designing scalable systems. First, you’ll learn about the building blocks of modern systems, with each component being a completely scalable application in itself. You'll then explore the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process. Finally, you'll design several popular services by using these modular building blocks in unique combinations, and learn how to evaluate your design.

26hrs
Intermediate
5 Playgrounds
18 Quizzes

Common nonfunctional requirements#

Let's discuss common nonfunctional requirements that interviewers focus on and learn how to meet them effectively during System Design interviews. The common nonfunctional requirements that I will address in this blog are:

  1. Performance

  2. Availability

  3. Scalability

1) Performance#

Performance is a known NFR that determines the system's ability to respond to user requests and process data efficiently. For example, when designing a messaging service, the interviewer might ask questions like: How to deliver messages with low latencyLow latency refers to minimal time delays? To achieve low latency, candidates need to choose an efficient two-way communication protocol. For this, they can opt for WebSocket. This is a single example of achieving performance. We will see more examples in the upcoming approaches section.

Let's look at different approaches to achieve performance.

Approaches to achieve performance#

Caching: Implementing a good cache mechanism is one of the methods to achieve performance. It stores frequently accessed data and reduces the need for repeated computations, which minimizes user-perceived latency.

Web service uses service cache to access frequently accessed data to ensure low latency
Web service uses service cache to access frequently accessed data to ensure low latency

Let's assume an X (formerly Twitter)-like system where a service is dedicated to generate the timelineThe timeline includes a stream of posts and various recommendations based on the user's interests and followers' activity (such as their posts, reposts, likes, etc.).. Let's call it the timeline service. Now, the question that interviewers commonly ask related to timeline is: Does the timeline service generate a timeline of each celebrity's followers when a celebrity posts something? As celebrities have millions of followers, it impacts the performance of the system to create timelines for all followers.

In order to address this question, we need to analyze the followers first. Not every follower uses X all the time. As a first step, we can divide those followers into active usersActive users are those who frequently use their X accounts. and inactive usersInactive users are those who used their accounts a long time ago (say, more than three months).. For inactive users, the timeline service will not generate the timeline instantly. For active users, we will introduce a cache. Let’s call it the feed cacheIt is a distributed cache like Redis for storing the timeline of active users.. This cache prepopulates the timeline for active users. When active users request a timeline, the timeline service immediately retrieves the timeline from the feed cache and appends the celebrity post, returning it to the client with minimal latency.

Fetching a timeline for active users from the feed cache
Fetching a timeline for active users from the feed cache

Additionally, a cache mechanism is usually implemented in each system layer to ensure decoupling and low latency.

Algorithm/Data structure selection: Choosing efficient algorithms and data structures is another approach to increase the performance of the system. An efficient algorithm or data structure minimizes processing time and improves overall system performance. For example, I asked a candidate which data structure would be suitable to efficiently and frequently (assume every four seconds) store a driver’s position in the ride-hailing system.

The candidate replied that the Quadtree data structure was a suitable option here to ensure performance; it receives data from the driver every 4 seconds, and the driver relocates within Quadtree, based on the new location.

But my next follow-up question to the candidate was: If we update the Quadtree every 4 seconds, then the computational overhead increases and ultimately leads to latency. So is using a Quadtree data structure to store the driver's position every 4 seconds the right choice?

So here's my advice for tackling such a situation: It's important to think about comparing different ways of designing systems, and maybe combining some approaches will give you an optimized approach.

You may encounter more such challenging questions in the interview regarding performance of the ride hailing system. For more details, explore our Uber System Design lesson.

Load balancing: Distributing incoming traffic evenly among different servers (load balancing) is another strategy to achieve high performance. For example, with millions of users on an e-commerce website, multiple requests can arrive in a second. Load balancing is useful in such situations to reduce the load on a server and give it only the load it can handle. Load balancers distribute user requests across multiple servers to prevent bottlenecks and server performance from going down.

Distributing user requests across multiple web servers
Distributing user requests across multiple web servers

2) Availability#

The system's availability is another nonfunctional requirement that describes how effectively it maintains users' accessibility and uptime. Generally, a system with 99.999% uptime is considered good availability. This percentage of availability is equivalent to less than 6 minutes of downtime per year. Achieving this amount of availability is very challenging. However, providing 99.999% availability helps retain a large number of users.

For example, when designing an online shopping website, availability is critical since customers frequently use the site to browse products, make purchases, and track orders. Any downtime might lead to lower sales and disappointed customers.

Let's discuss some general approaches to achieve availability.

Approaches to achieve availability#

Redundancy: One way to meet availability is by replicating key components and data across numerous servers and data centers. By doing this, we ensure that if one server fails or traffic is high, the load balancer can automatically reroute requests to an alternate backup server. Additionally, implementing redundant components across multiple layers (servers, databases, and networks) can prevent a single point of failure.

Replicating key components to eliminate single point of failure
Replicating key components to eliminate single point of failure

Fault tolerance: During a discount sale on a shopping website, one of the key database nodes in a specific region suffers a hardware breakdown. This node handles a considerable amount of the user's activities in this region. In such scenarios, our system must be fault-tolerant, which means it will continue to work even if one or more components fail. We can achieve this tolerance by using redundant components and failover methods that automatically switch traffic from the failed component to the backup component.

Rate limiting: Another approach to achieve availability is rate limiting. The rate limiter restricts the amount of requests that a service can handle. Setting rate limits to control the number of requests a user can make to prevent system overload. For example, on a social media platform, a system overload would occur when users like posts, play videos and follow others at a higher rate than usual. Without rate limiting, this sudden increase in activity can overwhelm the system, leading to system failure.

Rate limiting to prevent web server overload
Rate limiting to prevent web server overload

CDNs: CDNs are cache servers distributed in different regions which not only improve performance but also increase system availability by putting less load on origin servers. Deploying servers in multiple geographical locations to ensure that regional outages do not affect overall system availability. Also, they reduce latency for users in different locations. 

Stress testing and monitoring: Another way to ensure availability is stress testing. It is performed to determine how the system behaves under peak load conditions, allowing us to identify breaking points and ensure the system can handle sudden traffic spikes. This will prepare the system for availability after testing. Additionally, implementing monitoring can allow us to track system performance and detect anomalies in real time.

3) Scalability#

System scalability describes how a system expands to handle increasing numbers of users while maintaining performance. For example, an interviewer might ask questions about how to design a service like YouTube that can accommodate millions of users uploading and watching videos simultaneously—or designing a URL shortening service capable of handling billions of queries every day.

To address these questions about scalability in interviews, let's look at different approaches.

Approaches to achieve scalability#

Manual scaling: One approach to scale applications is manual scaling. It involves either upgrading hardware on existing machines (vertical) or adding more machines (horizontal).

  • Vertical scaling (hardware upgrades): Add more resources (RAM, CPU, storage) to existing machines for smaller demands. It’s easier to manage since we aren’t adding to the total number of machines.

  • Horizontal scaling (adding machines): Increase the number of machines to distribute the workload for larger demands. In general, horizontal scaling is considered to be the preferred option for large-scale applications because it does not have a single point of failure and also supports load balancing, compared to vertical scaling.

Vertical vs. horizontal scaling
Vertical vs. horizontal scaling

Automatic scaling: Dynamically adjust resources (storage, processing power) based on demand to handle traffic spikes. This can be achieved using a cloud computing technique called Auto Scaling.

Sharding: Another approach to achieve scalability by dividing the database into shards to distribute the load across multiple servers. In this way, we distribute the data load between multiple servers. Key-rangeIt distributes data based on specific ranges of keys. and hash-basedThis distributes the data by applying a hash function to the keys, ensuring even distribution across shards. sharding are common techniques to perform shading for databases.

Modular design: Break down the system into smaller, independent components so that each service can scale independently according to demands without affecting the performance of other services.

Monolithic vs. modular designs
Monolithic vs. modular designs

Cache and CDNs: Caches store frequently accessed data in memory to reduce response time and database load. CDNs, on the other hand, are used to distribute static content to users rather than retrieving that data from the origin, further reducing the load from the server. By using caching and CDN, the system can manage a high number of user requests without experiencing slower performance.

If you’re interested in exploring other nonfunctional requirements like reliability, maintainability, and security in greater depth, check out our Grokking the Modern System Design Interview course.

Cover
Grokking the Modern System Design Interview

System Design interviews are now part of every Engineering and Product Management Interview. Interviewers want candidates to exhibit their technical knowledge of core building blocks and the rationale of their design approach. This course presents carefully selected system design problems with detailed solutions that will enable you to handle complex scalability scenarios during an interview or designing new products. You will start with learning a bottom-up approach to designing scalable systems. First, you’ll learn about the building blocks of modern systems, with each component being a completely scalable application in itself. You'll then explore the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process. Finally, you'll design several popular services by using these modular building blocks in unique combinations, and learn how to evaluate your design.

26hrs
Intermediate
5 Playgrounds
18 Quizzes

Now that we’ve understood different approaches to achieve NFRs including performance, availability and scalability, let's practice them by taking a deeper look at Google Maps and YouTube System Design.

Acing NFRs: Google Maps and YouTube#

Let’s explore nonfunctional requirements for Google Maps and YouTube System Design problems.

Design Google Maps#

Designing a navigation system like Google Maps involves allowing users to identify their current location, find optimal routes based on specified destinations, and provide detailed turn-by-turn directions for seamless navigation.

Considering the following nonfunctional requirements of Google Maps, let’s describe the strategies to achieve them:

  • High availability: The design of Google Maps includes a large road network graph. If we hosted this graph on a server, it would definitely crash due to its large size and high user demands. To ensure availability, we divide the graph into smaller graphs or segments and host them on different servers. By replicating these servers, we eliminate single points of failure and use a load balancer to offload incoming user requests to multiple segment servers.

  • Scalability: To scale our Google Maps system, we use a distributed system where we host each segment on a different server to serve user requests for different routes from different segment servers. Thus, we can serve millions of user requests. As we are using a modular design here, we can easily add more segments to handle more data.

Let’s summarize the strategies we used to achieve Google Maps NFRS:

Nonfunctional requirements

Strategies

Availability

  • Divide the road network graph into small graphs (segments) to process user queries.
  • Replicate the small segment servers.
  • Request load balancing across different segment servers.

Scalability

  • Partition the large graphs into smaller graphs to ease segment addition.
  • Host the graphs on different servers to handle increased number of queries per second.

Are these the only nonfunctional requirements for Google Maps? What about the performance? How does our design ensure minimal response times? Think of a solution, and then explore the nonfunctional requirements of Google Maps in detail to deepen your understanding.

Design YouTube#

Designing a video streaming platform like YouTube involves enabling users to stream videos, upload videos, search for videos by their titles, and like/dislike videos.

Considering the following nonfunctional requirements of YouTube, let’s describe the strategies to achieve them:

  • Minimal response times: To ensure the performance of YouTube's design, we use different caching servers at the ISP level and CDN level to serve the most viewed content with the fastest response times. At the same time, choosing an appropriate storage system for different types of data, such as using Bigtable to store thumbnails and Blob storage to store videos, can reduce latency. We prefer to use a Lighttpd-based web server where users upload their videos as it processes such content faster and provides a smoother user experience.

  • Reliability: To make the system highly reliable, we use data sharding to ensure that if one type of data is unavailable, it will not affect the others. We replicate critical components to achieve system fault tolerance and eliminate faulty servers by monitoring their health using heartbeat messagesA node in a distributed system sends a regularly spaced message indicating it's healthy and active. If the node fails to send heartbeat messages, other nodes can assume that the node has failed..

The next question that comes to mind is how YouTube manages an increasing number of users and data storage. Which strategies would make our YouTube design scalable and available? Think about the solution and then explore how to meet YouTube's nonfunctional requirements in detail to enhance your understanding.

Nonfunctional requirements

Strategies

Less response time

  • Cache at different layers
  • CDNs
  • Choose appropriate storage systems (e.g., blob storage to store videos, Bigtable to store thumbnails)
  • Serve videos and static content with Lighttpd


Reliability

  • Data sharding
  • Replicate critical components
  • Heartbeat protocol

Quick tips for NFR interview questions#

  • Proactively ask questions to clarify nonfunctional requirements during the interview. For example:

    • Expected user traffic

    • Expected data load

    • Expected downtime tolerance

  • Evaluate trade-offs between different techniques, such as system complexity, cost, and maintainability.

  • Prepare a list of commonly asked questions with their solutions. For example:

    • For reliable transaction processing—choose ACID-compliant relational databases. 

    • For large data applications—choose NoSQL databases like MongoDB or Cassandra to achieve scalability. 

    • Real-time data processing and analytics—choose platforms like Apache Kafka, Amazon Kinesis, etc.

Remember, there is no one-size-fits-all solution. Every design decision involves trade-offs. As a designer of scalable systems, your ability to weigh these trade-offs is critical. Ask the interviewer clarifying questions, consider the NFRs carefully, and make informed choices to create a robust system design.

What’s next?#

In this blog, I have attempted to demonstrate the importance of nonfunctional requirements and how to address them in System Design interviews. By understanding common NFRs and practical strategies for solving them, you will be better prepared to address NFR-related questions during your interview.

I highly recommend the following courses for hands-on practice achieving nonfunctional requirements and preparing for a challenging interview at FAANG/MAANG companies.

These are the resources I would have had back when I was interviewing at Big Tech.

Happy learning!


  

Free Resources