Review of Modeling Results
Here’s a look at a review of modeling results.
In order to develop a binary classification model to meet the business requirements of our client, we have now tried several modeling techniques with varying degrees of success. In the end, we'd like to choose the model with the best performance to do further analyses on and present to our client. However, it is also good to communicate the other options we explored, demonstrating a thoroughly researched project.
Here, we review the different models we tried for the case study problem, the hyperparameters we needed to tune, and the results from cross-validation or the validation set in the case of XGBoost. We only include the work we did using all possible features, not the earlier exploratory models where we used only one or two features:
Summary of modeling activities with case study data
Model | Location in course | Tuned hyperparameters | Validation ROC AUC |
Logistic regression with L1 regularization | Section: The Bias-variance Trade-off Challenge: Cross-validation and Feature Engineering | Regularization parameter C | 0.719 |
Logistic regression with L1 regularization and interaction features | Section: The Bias-variance Trade-off Exercise: Cross-validation and Feature Engineering | Regularization parameter C | 0.739 |
Decision tree | Section: Decision Trees Challenge: Finding Optimal Hyperparameters for Decision Tree | Maximum depth | 0.746 |
Random forest | Section: Decision Trees Challenge: Cross-validation Grid Search with Random Forest | Maximum depth and number of trees | 0.776 |
XGBoost | Section: Gradient Boosting, XGBoost, and SHAP Values Challenge: XGBoost and SHAP Explanation for Case Study Data | Maximum leaves | 0.779 |
When presenting results to the client, you should be prepared to interpret them for business partners at all levels of technical familiarity, including those with a very little technical background. For example, business partners may not understand the derivation of the ROC AUC measure; however, this is an important concept because it's the main performance metric we used to assess models. You may need to explain that it's a metric that can vary between 0.5 and 1 and give intuitive explanations for these limits: 0.5 is no better than a coin flip, and 1 is perfection, which is essentially unattainable.
Our results are somewhere in between, getting close to 0.78 with the best model we ...