Training Data Generation
Let's learn about the techniques for generating training data for the ad prediction system.
The performance of the user engagement prediction model will depend drastically on the quality and quantity of training data. So let’s see how the training data for our model can be generated.
Training data generation through online user engagement
When we show an ad to the user, they can engage with it or ignore it. Positive examples result from users engaging with ads, e.g., clicking or adding an item to their cart. Negative examples result from users ignoring the ads or providing negative feedback on the ad.
Suppose the advertiser specifies “click” to be counted as a positive action on the ad. In this scenario, a user-click on an ad is considered as a positive training example, and a user ignoring the ad is considered as a negative example.
Suppose the ad refers to an online shopping platform and the advertiser specifies the action “add to cart” to be counted as positive user engagement. Here, if the user clicks to view the ad and does not add items to the cart, it is counted as a negative training example.
Balancing positive and negative training examples
Users’ engagement with an ad can be fairly low based on the platform e.g. in case of a feed system where people generally browse content and engage with minimal content, it can be as low as 2-3%.
How would this percentage affect the ratio of positive and negative examples on a larger scale?
Let’s look at an extreme example by assuming that one-hundred million ads are viewed collectively by the users in a day with a 2% engagement rate. This will result in roughly million positive examples (where people engage with the ad) and million ...