Design Considerations of Yelp
Learn about the different design aspects of the Yelp system.
Introduction
We discussed the design, building blocks, components, and the entire workflow of our system in the previous lesson, which brought up a number of interesting questions that now need answering. For example, what approach do we use to find the places, or how do we rank places based on their popularity? This lesson addresses important design concerns like the ones we just mentioned. The table given below summarizes the goals of this lesson.
Summary of the Lesson
Section/Sub-section | Purpose |
Searching | This process divides the world into segments to optimize the search, so that all nearby sites in a given location and radius can be identified. |
Storing Indexes | We index the places to improve query performance and estimate the storage required for all of the indexes. |
Searching Using Segments | We search all the desired places by combining the segments. |
Dynamic Segments | We solve the static segment limitations using dynamic segments. |
Searching Using a QuadTree | We explore the searching functionality using a QuadTree. |
Space Estimations for a QuadTree | We estimate the storage required for a QuadTree. |
Data Partitioning | We look into the options we can use to partition data. |
Ensuring Availability | We look into how we’ll ensure the availability of the system. |
Inserting a New Place | We look into how we’ll insert the place in a QuadTree. |
Ranking Popular Places | We rank the places of the system. |
Evaluation | We evaluate how our system fulfills the non-functional requirements. |
Searching
From Google Maps, we were able to connect segments and meet the scalability challenge to process a large graph efficiently. The graph of the world had numerous nodes and vertices, and traversing them was time-consuming. Therefore, we divided the whole world into smaller segments/subgraphs to process and query them simultaneously. The segmentation helps us improve the scalability of the system.
Each segment will be of the size and will contain a list of places that exist within it. We only search a few segments to locate destinations that are close by. We can use a given location and defined radius to find all the nearby segments and identify sites that are close.
Points to Ponder
How will we find nearby places if we create a table for storing all places?
We can store all the places in a table and uniquely identify a segment by having a segment_ID
. We can index each segment in the database. Now, we have limited the number of segments we need to search, so the query will be optimized and return results quickly.
Improve data fetching
We use a key-value store for quick access to places in the segments. The key is the segment_ID
, while the value
contains the list of places in that segment. Let’s estimate how much space we need to store the indexes.
We can usually store a few MBs in the value of the key-value store. If we assume that each segment has 500 places then it will take up , and we can easily store it as a value.
The total area of the ...