Metrics

Let's explore some metrics that will help you define the “success” scenarios for a search session.

Choosing a metric for a machine learning model is of paramount importance. Machine learning models learn directly from the data, and no human intuition is encoded into the model. Hence, selecting the wrong metric results in the model becoming optimized for a completely wrong criterion.

There are two types of metrics to evaluate the success of a search query:

  1. Online metrics
  2. Offline metrics

We refer to metrics that are computed as part of user interaction in a live system as Online metrics. Meanwhile, offline metrics use offline data to measure the quality of your search engine and don’t rely on getting direct feedback from the users of the system.

Online metrics

In an online setting, you can base the success of a search session on user actions. On a per-query level, you can define success as the user action of clicking on a result.

A simple click-based metric is click-through rate.

Click-through rate

The click-through rate measures the ratio of clicks to impressions.

📝 Click through rate = NumberofclicksNumberofimpressions\frac{Numberofclicks}{Numberofimpressions}

In the above definition, an impression means a view. For example, when a search engine result page loads and the user has seen the result, you will consider that as an impression. A click on that result is your success.

Successful session rate

One problem with the click-through rate could be that unsuccessful clicks will also be counted towards search success. For example, this might include short clicks where the searcher only looked at the resultant document and clicked back immediately. You could solve this issue by filtering your data to only successful clicks, i.e., to only consider clicks that have a long dwell time.

📝 Dwell time is the length of time a searcher spends viewing a webpage after they’ve clicked a link on a search engine result page (SERP).

Therefore, successful sessions can be defined as the ones that have a click with a ten-second or longer dwell time.

📝 Session success rate = no.  of  successful  sessionsno.  of  total  sessions\frac{no.\; of \;successful \;sessions}{no.\;of\;total\; sessions}

A session can also be successful without a click as explained next.

Caveat

Another aspect to consider is zero-click searches.

📝 Zero-click searches: A SERP may answer the searcher’s query right at the top such that the searcher doesn’t need any further clicks to complete the search.

For example, a searcher queries “einstein’s age”, and the SERP shows an excerpt from a website in response, as shown below:

Press + to interact

The searcher has found what they were looking for without a single click!. The click-through rate would not work in this case (but your definition of a successful session should definitely include it). We can fix this using a simple technique shown below.

Time to success

Until now, we have been considering a single query-based search session. However, it may span over several queries. For example, the searcher initially queries: “italian food”. They find that the results are not what they are looking for and make a more specific query: “italian restaurants”. Also, at times, the searcher might have to go over multiple results to find the one that they are looking for.

Ideally, you want the searcher to go to the result that answers their question in the minimal number of queries and as high on the results page as possible. So, time to success is an important metric to track and measure search engine success.

📝 Note: For scenarios like this, a low number of queries per session means that your system was good at guessing what the searcher actually wanted despite their poorly worded query. So, in this case, we should consider a low number of queries per session in your definition of a successful search session.

Offline metrics

The offline methods to measure a successful search session makes use of trained human raters. They are asked to rate the relevance of the query results objectively, keeping in view well-defined guidelines. These ratings are then aggregated across a query sample to serve as the ground truth.

📝 Ground truth refers to the actual output that is desired of the system. In this case, it is the ranking or rating information provided by the human raters.

Let’s see normalized discounted cumulative gain (NDCG) in detail as it’s a critical evaluation metric for any ranking problem.

NDCG

You will be looking at NDCG as a common measure of the quality of search ranking results.

NDCG is an improvement on cumulative gain (CG).

📝 CGp=i=1preliCG_p =\sum _{{i=1}}^{{p}}rel_{{i}} ...