Home/Blog/Interview Prep/3 challenging System Design problems for your interview prep
Home/Blog/Interview Prep/3 challenging System Design problems for your interview prep

3 challenging System Design problems for your interview prep

Fahim ul Haq
10 min read

Before I co-founded Educative, I was a distributed systems engineer at Facebook and Microsoft for a little over 10 years. While working with web-scale distributed systems, I was party to hundreds of Software Engineering interview loops covering both Coding and System Design Interviews (SDIs).

The System Design Interview can be daunting. Understandably, engineers with no real-world experience designing large-scale systems find SDIs intimidating. Beyond that, SDIs differ greatly from most technical interviews. With no fixed right and wrong answers, the open-ended nature of System Design interview questions leaves many interviewees lost. These three questions we will cover are some of the most open-ended and complex I’ve encountered.

From my experience conducting System Design Interviews, there are always questions that tend to trip up even the most prepared candidates. Today, I want to help you make sure you’re not caught off-guard during your interview. In that spirit, I’ll talk about three of the most challenging System Design questions I’ve asked and how I would approach them.

1. Designing Uber#

Designing Uber or other rideshare services is a complicated process. Between our two main subjects, the static requester (the rider) and the dynamic receiver (the driver), a lot of tasks must be completed for every single interaction on the Uber app. The customer and driver connection must include the following:

  • Drivers being able to notify the service regarding their current location and availability frequently
  • Passengers are able to see all nearby drivers in real-time
  • Customers requesting rides using a destination and pickup time
  • Nearby drivers are notified when a customer requests a pick-up
  • Once a ride is accepted, the driver and customer must see the other’s current location for the entire duration of the ride
  • Once the drive is complete, the driver should then be available for another customer
Detailed design of Uber
Detailed design of Uber

Obviously (as seen above), there are multiple parts of this system that need to be put in place for the entire operation to run smoothly. To simplify your approach, I suggest taking a step back and focusing on the core problem.

I view the core problem as the part of the system that is most difficult to execute. The most advanced portion of Uber’s system is not how to store and fetch the data being collected, as that kind of task is done by many large-scale systems.

Our core problem that is unique to the design of Uber is matching a (relatively) static rider with a (potentially) moving driver. The platform must also tell both the driver and the rider of the car’s current location while the ride is in progress.

I’d suggest tackling this problem before any other. Without the core problem being met, the rest of the system’s components are rendered obsolete.

A design problem similar to Uber’s is that of designing Yelp. I consider this an excellent prerequisite because it experiences many of the same obstacles as Uber. They both store massive amounts of location data and aim to achieve quick match results based on the location of two entities.

Both designs benefit from the use of a QuadTree data structure to connect customers to their destinations. However, for Uber, the QuadTree must be adapted for frequent location updates. For the most part, Yelp’s system connects two static points (the customer and the restaurant), while Uber must connect a moving point to a standing one in the same amount of search time. Once you’ve gotten comfortable with Yelp’s design and the QuadTree, you can tackle Uber’s design, as it is more complicated.

Side note: Identifying and solving a “core problem” first and then branching off to its related system components is a technique that works for any SDI question. That said, I particularly recommend this approach for the type of design question with a unique and challenging core problem. Think of it as eating your least favorite thing on your plate so you can enjoy the rest of your meal pain-free!

Assuming that we’re working with the data structure from our Yelp design problem, Quadtrees help to divide the map into segments. If the number of drivers exceeds a certain limit, then we split that segment into four more child nodes and section the drivers into them. Each child node in our QuadTrees contains segments that can’t be divided further. We can use the same QuadTree segments for connecting drivers. The most significant difference between the QuadTree used in the Yelp and Uber designs is that our QuadTree wasn’t designed with regular upgrades in mind. It takes a longer amount of time to modify the QuadTree whenever a driver’s position changes. To identify the driver’s new location, we must first find a proper grid depending on the driver’s previous position. If the new location doesn’t match the current grid, we should remove the driver from the current grid and shift it to the correct grid. To fulfill our system requirements and solve the problem at hand, we must upgrade our data structures to update driver locations every four seconds.

To pull off driver updates every four seconds, I’d suggest starting with a hash table to store the latest position of the drivers and updating our QuadTree occasionally, say after 10–15 seconds. We can update the driver’s location in the QuadTree around every 15 seconds instead of four seconds, and we use a hash table that updates every four seconds to reflect the drivers’ latest location. By doing this, you are achieving your main aim while using minimal resources and time.

widget

This step is merely the beginning. Uber’s design is a complicated and vast issue. To continue learning about how to solve this design problem, check out this page from our Grokking Modern System Design Interview the breaks down all parts of Uber’s design.

Get hands-on with System Desin interview preperation today.#

Try one of our 500+ courses and learning paths: Grokking Modern System Design Interview for Engineers & Managers.


2. Twitter’s heavy-hitter problem#

This problem is one that affects many notable companies that you probably interact with on a daily basis. More often than not, it is also a problem that I have seen spark confusion among qualified candidates.

So, what is the heavy-hitter problem? For example, imagine we are designing Twitter. A problem Twitter’s system faces every day is when a single account with millions of followers, such as Justin Beiber, sends a tweet. This single tweet from a single account may generate millions of comments, views, and likes per second. As seconds tick on, the tweet will continue to increase in the amount of traffic it’s generating. The system as a whole has to be built to deal with this sudden surge of user requests and spikes in traffic from a single post. This problem is known as the heavy-hitter problem for companies with a significant celebrity presence, such as Twitter, Facebook, and YouTube.

So, how do you approach problems such as the heavy hitter? It’s not easy to simply count the number of views by keeping a counter on a single machine. So we use sharded counters. A sharded counter, also known as a distributed counter, is where multiple counters are responsible for a specific number of shards. These shards run parallel on different computational units. We can improve performance and limit contention by balancing the millions of write requests across the shards. The shard counter is a critical design element when scalable counting is necessary. Being built for the scalability of large-scale counting makes this solution a perfect match for our heavy-hitter problem.

Counters and their shards working on different computational units
Counters and their shards working on different computational units

To get more practice with sharded counters, check out our Complete guide to System Design!

3. Capacity estimation problems#

Capacity planning is not the most frequently asked type of SDI question, but it is a vital process for large-scale FAANG systems. First off, what is a pure capacity problem? The core problem you will face is the estimation of resources required to cater to user requests for a specific design problem. For example, consider the amount of storage space YouTube requires to store videos uploaded each day or the servers and bandwidth required to help stream videos to users across the world.

So, how would I like to see candidates solve this problem? Since the capacity of each company and system are different, there really is no one go-to solution for this kind of question. A tip that helped me immensely is to research the bandwidth and capacity required for the company you’re interviewing with.

For example, in YouTube’s storage estimate, we have to gauge the total number of videos and the length of each video uploaded to YouTube per minute. Let’s assume that 500 hours worth of content is uploaded to YouTube in one minute. Since each video of 30 MB is 5 minutes long, we require 30/5 = 6 MB to store 1 minute of video.

The following formula then looks like this:

widget
Visual of YouTube's storage estimation
Visual of YouTube's storage estimation

Of course, not every upload is the same. YouTube requires more storage than we calculated and demands more flexibility. More often than not, capacity planning estimates from the design stage don’t hold up in reality. That’s why being prepared to be flexible and plan continuously is essential for capacity planning.

Calculations like the example above help us ignore the nitty-gritty details of the system (at least at the design level) and focus on more critical aspects. These calculations are referred to as back-of-the-envelope calculations and do not represent the capacity reality on their own. In addition, we have to make some underlying assumptions, which will be the basis of our capacity calculations.

For instance, we can estimate the storage capacity of each server and then the number of storage servers required to store all the videos. We may also assume the replication number of the chunks/segments of a video, the latency time for 1 KB of data to travel from one data center to another, and so on.

To keep learning about this flexible process for solving capacity problems, including resource estimation and back-of-the-envelope calculations, check out the rest of the YouTube lesson from our Grokking Modern System Design course.

Capacity estimation problems should be discussed more in System Design Interview preparation. In my opinion, because the design doesn’t always transfer to reality, capacity estimation is one of the more critical and challenging aspects of modern large-scale System Design.


Continue practicing with more SDI questions today!#

System Design is a vital part of today’s world, and I always love seeing more people enter the field each year. The System Design Interview can be tricky no matter who you are. That’s why I want to share as much expertise as I can for the questions that I’ve seen result in confusion among candidates.

Good luck on your journey! Now that you are aware of these problematic questions, make sure you practice them before the real deal! Our System Design Interview courses are excellent places to find solutions to these problems and get hands-on practice. You now know where you might be vulnerable in your interview. Make sure you’re preparing where others might be underprepared.

Continue your System Design Interview preparation with Grokking Modern System Design Interviews for more practice with the problems we discussed in this article and many more.

As always, good luck and happy learning!

Continue learning about how to conquer the System Design interview#


  

Free Resources