Grokking the Machine Learning Interview/

...

Training Data Generation

Let's learn about the techniques for generating training data for the ad prediction system.

We'll cover the following...

Training data generation through online user engagement
Balancing positive and negative training examples
Model recalibration
Train test split

The performance of the user engagement prediction model will depend drastically on the quality and quantity of training data. So let’s see how the training data for our model can be generated.

Training data generation through online user engagement

When we show an ad to the user, they can engage with it or ignore it. Positive examples result from users engaging with ads, e.g., clicking or adding an item to their cart. Negative examples result from users ignoring the ads or providing negative feedback on the ad.

Press + to interact

Balancing positive and negative training examples

Users’ engagement with an ad can be fairly low based on the platform e.g. in case of a feed system where people generally browse content and engage with minimal content, it can be as low as 2-3%.

How would this percentage affect the ratio of positive and negative examples on a larger scale?

Let’s look at an extreme example by assuming that one-hundred million ads are viewed collectively by the users in a day with a 2% engagement rate. This will result in roughly $two$ million positive examples (where people engage with the ad) and $98$ million negative examples (where people ignore the ...

Introduction

Practical ML Techniques/Concepts

Search Ranking

Feed Based System

Recommendation System

Self-Driving Car: Image Segmentation

Entity Linking System

Ad Prediction System

Training Data Generation

Training data generation through online user engagement

Balancing positive and negative training examples