Ad Click Prediction Model
Learn about the Ad prediction model architecture.
We'll cover the following
3. Model
Feature engineering
Features | Feature engineering | Description |
---|---|---|
AdvertiserID | Use Embedding or feature hashing | It’s easy to have millions of advertisers |
User’s historical behavior, i.e., numbers of clicks on ads over a period of time. | Feature scaling, i.e., normalization | |
Temporal: time_of_day, day_of_week etc | One hot encoding | |
Cross features | Combine multiple features | See example in the Machine Learning System Design Primer |
Training data
Before building any ML models we need to collect training data. The goal here is to collect data across different types of posts while simultaneously improving the user experience. As you recall from the previous lesson about the waterfall model, we can collect a lot of data about ad clicks. We can use this data for training the Ad Click model.
We can start to use data for training by selecting a period of data: last month, last six months, etc. In practice, we want to find a balance between training time and model accuracy. We also downsample the negative data to handle the imbalanced data.
Model
Selection
- We can use deep learning in distributed settings. We can start with fully connected layers with the Sigmoid activation function applied to the final layer. Because the CTR is usually very small (less than 1%), we would need to resample the training data set to make the data less imbalanced. It’s important to leave the validation and test sets intact to have accurate estimations about model performance.
Get hands-on with 1400+ tech skills courses.