Feature Engineering
Let's engineer features for the candidate generation and ranking model.
We'll cover the following...
To start the feature engineering process, we will first identify the main actors in the movie/show recommendation process:
- Logged-in user
- Movie/show
- Context (e.g., season, time, etc.)
Features
Now it’s time to generate features based on these actors. The features would fall into the following categories:
- User-based features
- Context-based features
- Media-based features
- Media-user cross features
A subset of the features is shown below.
User-based features
Let’s look at various aspects of the user that can serve as useful features for the recommendation model.
-
age
This feature will allow the model to learn the kind of content that is appropriate for different age groups and recommend media accordingly.
-
gender
The model will learn about gender-based preferences and recommend media accordingly.
-
language
This feature will record the language of the user. It may be used by the model to see if a movie is in the same language that the user speaks.
-
country
This feature will record the country of the user. Users from different geographical regions have different content preferences. This feature can help the model learn geographic preferences and tune recommendations accordingly.
-
average_session_time
This feature (user’s average session time) can tell whether the user likes to watch lengthy or short movies/shows.
-
last_genre_watched
The genre of the last movie that a user has watched may serve as a hint for what they might like to watch next. For example, the model may discover a pattern that a user likes to watch thrillers or romantic movies.
The following are some user-based features (derived from historical interaction patterns) that have a sparse representation. The model can use these features to figure out user preferences.
-
user_actor_histogram
This feature would be a vector based on the histogram that shows the historical interaction between the active user and all actors in the media on Netflix. It will record the percentage of media that the user watched with each actor cast in it.
-
user_genre_histogram
This feature would be a vector based on the histogram that shows historical interaction between the active user and all the genres present on Netflix. It will record the percentage of media that the user watched belonging to each genre.
-
user_language_histogram
This feature would be a vector based on the histogram that shows historical interaction between the active user and all the languages in the media on Netflix. ...