Ranking
Let's see how to design the search ranking model.
We'll cover the following...
Objective
You want to build a model that will display the most relevant results for a searcher’s query. Extra emphasis should be applied to the relevance of the top few results since a user generally focuses on them more on the SERP.
📝 Learning to Rank (LTR): A class of techniques that applies supervised machine learning (ML) to solve ranking problems. The pointwise and pairwise techniques that we will apply fall under this class.
Stage-wise approach
As discussed in the architectural components section, the number of documents matching a single query can be very large. So, for a large scale search engine, it makes sense to adopt a multi-layer funnel approach. The top layer of the funnel looks at a large number of documents and uses simpler and faster algorithms for ranking. The bottom layer ranks a small number of documents with complex machine-learned models.
Let’s assume here that you will use two layers for ranking with one-hundred thousand and five-hundred documents ranked at each layer, as shown below. Though, the choice of layers and documents ranked on each layer depends highly on the available capacity.
The configuration shown above assumes that the first stage will receive one-hundred thousand relevant documents from the document selection component. You then reduce this number to five-hundred after ranking in this layer, ensuring that the topmost relevant results are forwarded to the second stage (also referred to as the recall of the documents).
It will then be the responsibility of the second stage to rank the documents such that topmost relevant results are placed in the correct order (also referred to as the precision of the documents).
📝 First stage model will focus on the recall of the top five to ten relevant documents in the first five-hundred results while the second stage will ensure precision of the top five to ten relevant documents.
Stage 1
As we try to limit documents in this stage from a large set to a relatively smaller set, it’s important that we don’t ...