Discovering predicted probabilities

How does logistic regression make predictions? Now that we’re familiar with accuracy, true and false positives and negatives, and the confusion matrix, we can explore new ways of using logistic regression to learn about more advanced binary classification metrics. So far, we’ve only considered logistic regression as a “black box” that can learn from labeled training data and then make binary predictions on new features. While we will learn about the workings of logistic regression in detail later in the course, we can begin to peek inside the black box now.

One thing to understand about how logistic regression works is that the raw predictions—in other words, the direct outputs from the mathematical equation that defines logistic regression—are not binary labels. They are actually probabilities on a scale from 0 to 1 (although, technically, the equation never allows the probabilities to be exactly equal to 0 or 1, as we’ll see later). These probabilities are only transformed into binary predictions through the use of a threshold. The threshold is the probability above which a prediction is declared to be positive, and below which it is negative. The threshold in scikit-learn is 0.5. This means any sample with a predicted probability of at least 0.5 is identified as positive, and any with a predicted probability < 0.5 is decided to be negative. However, we are free to use any threshold we want. In fact, choosing the threshold is one of the key flexibilities of logistic regression, as well as other machine learning classification algorithms that estimate probabilities of class membership.

Predicted probabilities from logistic regression

In the following exercise, we will get familiar with the predicted probabilities of logistic regression and how to obtain them from a scikit-learn model.

We can begin to discover predicted probabilities by further examining the methods available to us on the logistic regression model object that we trained earlier in this section. Recall that before, once we trained the model, we could then make binary ...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Exercise: Obtaining Probabilities from Logistic Regression Model

Discovering predicted probabilities

Predicted probabilities from logistic regression