Candidate Generation

The purpose of candidate generation is to select the top k (let's say one-thousand) movies that you would want to consider showing as recommendations to the end-user. Therefore, the task is to select these movies from a corpus of more than a million available movies.

In this lesson, we will be looking at a few techniques to generate media candidates that will match user interests based on the user’s historical interaction with the system.

Candidate generation techniques

The candidate generation techniques are as follows:

  1. Collaborative filtering
  2. Content-based filtering
  3. Embedding-based similarity

Each method has its own strengths for selecting good candidates, and we will combine all of them together to generate a complete list before passing it on to the ranked (this will be explained in the ranking lesson).

Collaborative filtering

In this technique, you find users similar to the active user based on the intersection of their historical watches. You, then, collaborate with similar users to generate candidate media for the active user, as shown below.

Press + to interact
Collaborative filtering
Collaborative filtering

There are two methods to perform collaborative filtering:

  1. Nearest neighborhood
  2. Matrix factorization

Method 1: Nearest neighborhood


User A is similar to user B and user C as they have watched the movies Inception and Interstellar. So, you can say that user A’s nearest neighbours are user B and user C. You will look at other movies liked by users B and C as candidates for user A’s recommendations.

Press + to interact
Nearest neighborhood
Nearest neighborhood

Let’s see how this concept is realized. You have a (n x m) matrix of user uiu_i(i  =  1  to  n)_{(i\; = \;1 \;to\; n)} and movie mjm_j(j  =  1  to  m)_{(j\; = \;1 \;to\; m)}. Each matrix element represents the feedback that the user i has given to a movie j. An empty cell means that user i has not watched movie j.

Press + to interact

To generate recommendations for user i, you need to predict their feedback for all the movies they haven’t watched. You will collaborate with users similar to user i for this process. Their ratings for a movie, not seen by user i, would give us a good idea of how user i would like it.

So, you ...

Access this course and 1400+ top-rated courses and projects.