Document Selection
From the one-hundred billion documents on the internet, let's retrieve the top one-hundred thousand that are relevant to the searcher's query.
We'll cover the following...
Previously you saw the layered model approach. We will be adopting this approach to perform search ranking. Let’s zoom in on the first step, i.e., document selection, as shown below:
From the one-hundred billion documents on the internet, we want to retrieve the top one-hundred thousand that are relevant to the searcher’s query by using information retrieval techniques.
Let’s get some terminologies out of the way before we start.
📝 Information retrieval is the science of searching for information in a document. It focuses on comparing the query text with the document text and determining what is a good match.
Documents
Document types are as follows:
- Web-pages
- Emails
- Books
- News stories
- Scholarly papers
- Text messages
- Word™ documents
- Powerpoint™ presentations