Ad Click Prediction Model

Learn about the Ad prediction model architecture.

3. Model

Feature engineering

Features Feature engineering Description
AdvertiserID Use Embedding or feature hashing It’s easy to have millions of advertisers
User’s historical behavior, i.e., numbers of clicks on ads over a period of time. Feature scaling, i.e., normalization
Temporal: time_of_day, day_of_week etc One hot encoding
Cross features Combine multiple features See example in the Machine Learning System Design Primer

Training data

Before building any ML models we need to collect training data. The goal here is to collect data across different types of posts while simultaneously improving the user experience. As you recall from the previous lesson about the waterfall model, we can collect a lot of data about ad clicks. We can use this data for training the Ad Click model.

We can start to use data for training by selecting a period of data: last month, last six months, etc. In practice, we want to find a balance between training time and model accuracy. We also downsample the negative data to handle the imbalanced data.

Model

Selection

  • We can use deep learning in distributed settings. We can start with fully connected layers with the Sigmoid activation function applied to the final layer. Because the CTR is usually very small (less than 1%), we would need to resample the training data set to make the data less imbalanced. It’s important to leave the validation and test sets intact to have accurate estimations about model performance.

Get hands-on with 1300+ tech skills courses.