...

Distribution of Predicted Probability and Decile Chart

Learn about visualizing model performance with predicted probability distribution and decile chart.

We'll cover the following...

Model Performance Analysis with ROC AUC
Try it yourself

Press + to interact

The histogram of predicted probabilities for the test set shows that most predictions are clustered in the range [0, 0.2]. In other words, most borrowers have between a 0 and 20% chance of default, according to the model. However, there appears to be a small cluster of borrowers with a higher risk, centered near 0.7.

A visually intuitive way to examine model performance for different regions of predicted default risk is to create a decile chart, which groups borrowers together based on the decile of predicted probability. Within each decile, we can compute the true default rate. We would expect to see a steady increase in the default rate from the lowest prediction deciles to the highest.

We can compute deciles like we did in Exercise: Randomized Grid Search to Tune XGBoost Hyperparameters, using pandas' qcut:

Here we are splitting the predicted probabilities for the test set, supplied with the x keyword argument. We want to split them into ten equal-sized bins, with the bottom 10% of predicted probabilities in the first bin and so on, so we indicate we want q=10 quantiles. However, you can split into any number of bins you want, such as 20 (ventiles) or 5 (quintiles). Because we indicate retbins=True, the bin edges are returned in the decile_bin_edges variable, while the series of decile labels is in deciles. We can examine the 11 bin edges needed to create ten bins:

Press + to interact

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Distribution of Predicted Probability and Decile Chart

Model Performance Analysis with ROC AUC