Grokking the Machine Learning Interview/

...

Training Data Generation

Let's collect and label training data for the feed ranking ML model.

We'll cover the following...

Training data generation through online user engagement
Balancing positive and negative training examples
Train test split

Your user engagement prediction model’s performance will depend largely on the quality and quantity of the training data. So, let’s see how you can generate training data for your model.

📝 Note that the term training data row and training example will be used interchangeably.

Training data generation through online user engagement

The users’ online engagement with Tweets can give us positive and negative training examples. For instance, if you are training a single model to predict user engagement, then all the Tweets that received user engagement would be labeled as positive training examples. Similarly, the Tweets that only have impressions would be labeled as negative training examples.

📝 Impression: If a Tweet is displayed on a user’s Twitter feed, it counts as an impression. It is not necessary that the user reads it or engages with it, scrolling past it also counts as an impression.

Press + to interact

Introduction

Practical ML Techniques/Concepts

Search Ranking

Feed Based System

Recommendation System

Self-Driving Car: Image Segmentation

Entity Linking System

Ad Prediction System

Training Data Generation

Training data generation through online user engagement

Balancing positive and negative training examples