Feature Engineering

Let's engineer features for the candidate generation and ranking model.

To start the feature engineering process, we will first identify the main actors in the movie/show recommendation process:

  1. Logged-in user
  2. Movie/show
  3. Context (e.g., season, time, etc.)
Press + to interact
Main actors in media recommendation
Main actors in media recommendation

Features

Now it’s time to generate features based on these actors. The features would fall into the following categories:

  1. User-based features
  2. Context-based features
  3. Media-based features
  4. Media-user cross features

A subset of the features is shown below.

Press + to interact
Features in the training data row
Features in the training data row

User-based features

Let’s look at various aspects of the user that can serve as useful features for the recommendation model.

  • age

    This feature will allow the model to learn the kind of content that is appropriate for different age groups and recommend media accordingly.

  • gender

    The model will learn about gender-based preferences and recommend media accordingly.

  • language

    This feature will record the language of the user. It may be used by the model to see if a movie is in the same language that the user speaks.

  • country

    This feature will record the country of the user. Users from different geographical regions have different content preferences. This feature can help the model learn geographic preferences and tune recommendations accordingly.

  • average_session_time

    This feature (user’s average session time) can tell whether the user likes to watch lengthy or short movies/shows.

  • last_genre_watched

    The genre of the last movie that a user has watched may serve as a hint for what they might like to watch next. For example, the model may discover a pattern that a user likes to watch thrillers or romantic movies.

The following are some user-based features (derived from historical interaction patterns) that have a sparse representation. The model can use these features to figure out user preferences.

  • user_actor_histogram

    This feature would be a vector based on the histogram that shows the historical interaction between the active user and all actors in the media on Netflix. It will record the percentage of media that the user watched with each actor cast in it.

Press + to interact
User-actor histogram vector as a feature for the model
User-actor histogram vector as a feature for the model
  • user_genre_histogram

    This feature would be a vector based on the histogram that shows historical interaction between the active user and all the genres present on Netflix. It will record the percentage of media that the user watched belonging to each genre.

  • user_language_histogram

    This feature would be a vector based on the histogram that shows historical interaction between the active user and all the languages in the media on Netflix. ...