Document Selection
Explore the process of document selection within search ranking systems by understanding how relevant documents are retrieved from billions of data points. Learn key concepts such as information retrieval, inverted indexes, selection criteria, and relevance scoring factors including term match, document popularity, query intent, and personalization. This lesson helps you grasp the foundation of filtering and ranking results effectively before passing them to the ranker.
We'll cover the following...
Previously you saw the layered model approach. We will be adopting this approach to perform search ranking. Let’s zoom in on the first step, i.e., document selection, as shown below:
From the one-hundred billion documents on the internet, we want to retrieve the top one-hundred thousand that are relevant to the searcher’s query by using information retrieval techniques.
Let’s get some terminologies out of the way before we start.
📝 Information retrieval is the science of searching for information in a document. It focuses on comparing the query text with the document text and determining what is a good match.
Documents
Document types are as follows:
- Web-pages
- Emails
- Books
- News stories
- Scholarly papers
- Text