...

Assumptions of Logistic Regression

Learn about the assumptions logistic regression makes about the data.

We'll cover the following...

The four assumptions of logistic regression
No outliers
How many features should you include?

Because it is a classical statistical model, similar to the F-test and Pearson correlation we already examined, logistic regression makes certain assumptions about the data. While it’s not necessary to follow every one of these assumptions in the strictest possible sense, it’s good to be aware of them. That way, if a logistic regression model is not performing very well, you can try to investigate and figure out why, using your knowledge of the ideal situation that logistic regression is intended for. You may find slightly different lists of the specific assumptions from different resources. However, those that are listed here are widely accepted.

The four assumptions of logistic regression

Here are the four most widely accepted assumptions of logistic regression.

Features are linear in the log odds

Logistic regression is a linear model, so it will only work well as long as the features are effective at describing a linear trend in the log odds. In particular, logistic regression won’t capture interactions, polynomial features, or the discretization of features, on its own. You can, however, specify all of these as “new features”—even though they may be engineered from existing features.

Remember from the previous section that the most important feature from univariate feature exploration, PAY_1, was not found to be linear in the log odds.

No multicollinearity of features

Multicollinearity means that features are correlated with each other. The worst violation of this assumption is when features are perfectly correlated with each other, such ...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Assumptions of Logistic Regression

The four assumptions of logistic regression

Features are linear in the log odds

No multicollinearity of features