...

The Receiver Operating Characteristic (ROC) Curve

Learn about the significance of the ROC curve and the area under it.

We'll cover the following...

Understanding the ROC curve
Interpreting the ROC curve
The area under the curve (AUC) of the ROC curve
Interpreting the AUC of ROC
Try it yourself

Deciding on a threshold for a classifier is a question of finding the “sweet spot” where we are successfully recovering enough true positives, without incurring too many false positives. As the threshold is lowered more and more, there will be more of both. A good classifier will be able to capture more true positives without the expense of a large number of false positives. What would be the effect of lowering the threshold even more, with the predicted probabilities from the previous exercise? It turns out there is a classic method of visualization in machine learning, with a corresponding metric that can help answer this kind of question.

Understanding the ROC curve

The receiver operating characteristic (ROC) curve is a plot of the pairs of TPRs (y-axis) and FPRs (x-axis) that result from lowering the threshold down from 1 all the way to 0. You can imagine that if the threshold is 1, there are no positive predictions because a logistic regression only predicts probabilities strictly between 0 and 1 (endpoints not included). Because there are no positive predictions, the TPR and the FPR are both 0, so the ROC curve starts out at (0, 0).

As the threshold is lowered, the TPR will start to increase, hopefully faster than the FPR if it’s a good classifier. Eventually, when the threshold is lowered all the way to 0, every sample is predicted to be positive, including all the samples that are, in fact, positive, but also all the samples that are actually negative. This means the TPR is 1 but the FPR is also 1. In between these two extremes are the reasonable options for where you may want to set the threshold, ...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

The Receiver Operating Characteristic (ROC) Curve

Understanding the ROC curve