...

>

System Design: Newsfeed System

System Design: Newsfeed System

Learn to design a scalable newsfeed system by defining clear functional and non-functional requirements, estimating resources to handle large-scale traffic, architecting core components for efficiency and reliability, and evaluating the final design’s trade-offs across performance, latency, and scalability.

What is a newsfeed?

A newsfeed of any social media platform (Twitter, Facebook, Instagram) is a list of stories generated by entitiesAn entity could be a page, group, friends, and followers of a user. that a user follows. It contains text, images, videos, and other activities such as likes, comments, shares, advertisements, and many more. This list is continuously updated and presented to the relevant users on the user’s home page. Similarly, a newsfeed system also displays the newsfeed to users from friends, followers, groups, and other pages, including a user’s own posts.

A newsfeed is essential for social media platform users because it keeps them informed about the latest industry developments, current affairs, and relevant information. It also provides them with additional reasons to return and connect with a platform on a regular basis. Billions of users use such platforms. The challenging task is to provide a personalized newsfeed in real-time while keeping the system scalable and highly available.

This lesson will discuss the high-level and detailed design of a newsfeed system for a social platform like Facebook, Twitter, Instagram, etc.

Newsfeeds on a mobile application
Newsfeeds on a mobile application

Now that we understand what a newsfeed is and the challenges it presents, we will begin by defining the system's requirements.

Requirements

To limit the scope of the problem, we’ll focus on the following functional and non-functional requirements:

Functional requirements

  • Newsfeed generation: The system will generate newsfeeds based on pages, groups, and followers that a user follows. A user may have many friends and followers. Therefore, the system should be capable of generating feeds from all friends and followers. The challenge here is that there is potentially a huge amount of content. Our system needs to decide which content to pick for the user and rank it further to decide which to show first.

  • Newsfeed contents: The newsfeed may contain text, images, and videos.

  • Newsfeed display: The system should affix new incoming posts to the newsfeed for all active users based on some ranking mechanism. Once ranked, we show content to a user with higher-ranked first.

Non-functional requirements

  • Scalability: Our proposed system should be highly scalable to support the ever-increasing number of users on any platform, such as Twitter, Facebook, and Instagram.

  • Fault tolerance: As the system should be handling a large amount of data, therefore, partition tolerance (system availability in the event of network failure between the system’s components) is necessary.

  • Availability: The service must be highly available to keep the users engaged with the platform. The system can compromise strong consistency for availability and fault tolerance, according to the PACELC theoremThe PACELC theorem is an extension of the CAP theorem that states, in the event of network Partition, one should choose between Availability or Consistency; else, choose between Latency and Consistency..

  • Low latency: The system should provide newsfeeds in real-time. Hence, the maximum latency should not be greater than 2 seconds.

These requirements, particularly scalability, need to be quantified. The process of resource estimation will help us understand the magnitude of traffic, storage, and server power needed

Resource estimation

Let’s assume the platform for which the newsfeed system is designed has 1 billion users per day, out of which, on average, 500 million are daily active users. Also, each user has 300 friends and follows 250 pages on average. Based on the assumed statistics, let’s look at the traffic, storage, and server estimation.

Traffic estimation

Let’s assume that each daily active user opens the application (or social media page) 10 times a day. The total number of requests per day would be:

500M×10=5500 M \times 10 = 5 billions request per day 58K\approx 58Krequests per second.

Storage estimation

Let’s assume that the feed will be generated offline and rendered upon a request. Also, we’ll precompute the top 200 posts for each user. Let’s calculate storage estimates for users’ metadata, posts containing text, and media content.

  1. Users’ metadata storage estimation: Suppose the storage required for one user’s metadata is 50 KB. For 1 billion users, we would need1B×50KB=50TB1B\times 50KB = 50 TB ...