Metrics

Let's look at the online and offline metrics used to judge the performance of the recommendation system.

In this lesson, you will look at different metrics that you can use to gauge the performance of the movie/show recommendation system.

Types of metrics

Like any other optimization problem, there are two types of metrics to measure the success of a movie/show recommendation system:

  1. Online metrics

    Online metrics are used to see the system’s performance through online evaluations on live data during an A/B test.

  2. Offline metrics

    Offline metrics are used in offline evaluations, which simulate the model’s performance in the production environment.

We might train multiple models and tune and test them offline with the held-out test data (historical interaction of users with recommended media). If its performance gain is worth the engineering effort to bring it into a production environment, the best performing model will then be selected for an online A/B test on live data.

Press + to interact

📝 If a model performs well in an offline test but not in the online test, we need to think about where we went wrong. For instance, we need to consider whether our data was biased or whether we split the data appropriately for train and test.

Have a look at the lesson about online experimentation.

Driving online metrics in the right direction is the ultimate goal of the recommendation system.

Online metrics

The following are some options for online metrics that we have for the system. Let’s go over each of them and discuss which one makes the most sense to be used as the key online success indicator.

Engagement rate

The success of the recommendation system is directly proportional to the number of recommendations that the user engages with. So, the engagement rate (sessions  with  clickstotal  number  of  sessions\frac{sessions\;with\;clicks}{total\;number\;of\;sessions}) can help us measure it. However, the user might click on a recommended movie but does not find it interesting enough to complete watching it. Therefore, only measuring the engagement rate with the recommendations provides an incomplete picture.

Videos watched

To take into account the unsuccessful clicks on the movie/show recommendations, we can also consider the average number of videos that the user has watched. We should only count videos that the user has spent at least a significant time watching (e.g., more than two minutes).

However, this metric can be problematic when it comes to the user starting to watch movie/series recommendations but not finding them interesting enough to finish them.

Series generally have several seasons and episodes, so watching one episode and then not continuing is also an indication of the user not finding the content interesting. So, just measuring the average number of videos watched might miss out on overall user satisfaction with the recommended content.

Session watch time

Session watch time measures the overall time a user spends watching content based on recommendations in a session. The key measurement aspect here is that the user is able to find a meaningful recommendation in a session such that they spend significant time watching it.

To illustrate intuitively on why session watch time is ...

Access this course and 1400+ top-rated courses and projects.