Candidate Generation
The purpose of candidate generation is to select the top k (let's say one-thousand) movies that you would want to consider showing as recommendations to the end-user. Therefore, the task is to select these movies from a corpus of more than a million available movies.
In this lesson, we will be looking at a few techniques to generate media candidates that will match user interests based on the user’s historical interaction with the system.
Candidate generation techniques
The candidate generation techniques are as follows:
- Collaborative filtering
- Content-based filtering
- Embedding-based similarity
Each method has its own strengths for selecting good candidates, and we will combine all of them together to generate a complete list before passing it on to the ranked (this will be explained in the ranking lesson).
Collaborative filtering
In this technique, you find users similar to the active user based on the intersection of their historical watches. You, then, collaborate with similar users to generate candidate media for the active user, as shown below.
There are two methods to perform collaborative filtering:
- Nearest neighborhood
- Matrix factorization
Method 1: Nearest neighborhood
User A is similar to user B and user C as they have watched the movies Inception and Interstellar. So, you can say that user A’s nearest neighbours are user B and user C. You will look at other movies liked by users B and C as candidates for user A’s recommendations.
Let’s see how this concept is realized. You have a (n x m) matrix of user and movie . Each matrix element represents the feedback that the user i has given to a movie j. An empty cell means that user i has not watched movie j.
To generate recommendations for user i, you need to predict their feedback for all the movies they haven’t watched. You will collaborate with users similar to user i for this process. Their ratings for a movie, not seen by user i, would give us a good idea of how user i would like it.
So, you ...