Grokking the Machine Learning Interview/

...

Metrics

Let's look at the online and offline metrics used to judge the performance of the recommendation system.

We'll cover the following...

Types of metrics
Online metrics
Offline metrics

In this lesson, you will look at different metrics that you can use to gauge the performance of the movie/show recommendation system.

Types of metrics

Like any other optimization problem, there are two types of metrics to measure the success of a movie/show recommendation system:

Online metrics

Online metrics are used to see the system’s performance through online evaluations on live data during an A/B test.
Offline metrics

Offline metrics are used in offline evaluations, which simulate the model’s performance in the production environment.

We might train multiple models and tune and test them offline with the held-out test data (historical interaction of users with recommended media). If its performance gain is worth the engineering effort to bring it into a production environment, the best performing model will then be selected for an online A/B test on live data.

Press + to interact

Online metrics

The following are some options for online metrics that we have for the system. Let’s go over each of them and discuss which one makes the most sense to be used as the key online success indicator.

Engagement rate

The success of the recommendation system is directly proportional to the number of recommendations that the user engages with. So, the engagement rate ( $\frac{sessions\;with\;clicks}{total\;number\;of\;sessions}$ ) can help us measure it. However, the user might click on a recommended movie but does not find it interesting enough to complete watching it. Therefore, only measuring the engagement rate with the recommendations provides an incomplete picture.

Videos watched

To take into account the unsuccessful clicks on the movie/show recommendations, we can also consider the average number of videos that the user has watched. We should only count videos that the user has spent at least a significant time watching (e.g., more than two minutes).

However, this metric can be problematic when it comes to the user starting to watch movie/series recommendations but not finding them interesting enough to finish them.

Series generally have several seasons and episodes, so watching one episode and then not continuing is also an indication of the user not finding the content interesting. So, just measuring the average number of videos watched might miss out on overall user satisfaction with the recommended content.

Introduction

Practical ML Techniques/Concepts

Search Ranking

Feed Based System

Recommendation System

Self-Driving Car: Image Segmentation

Entity Linking System

Ad Prediction System

Metrics

Types of metrics

Online metrics

Engagement rate

Videos watched

Session watch time