...

Exercise: Finding Appropriate Features for Logistic Regression

Learn how to examine the appropriateness of features for logistic regression.

We'll cover the following...

Examining the log odds of default within groups
Try it yourself

In the Visualizing Features and Response Variable Relationship exercise, we plotted a groupby/mean of what might be the most important feature of the model, according to our exploration so far: the PAY_1 feature. By grouping samples by the values of PAY_1, and then looking at the mean of the response variable, we are effectively looking at the probability, p, of default within each of these groups.

Examining the log odds of default within groups

In this exercise, we will evaluate the appropriateness of PAY_1 for logistic regression. We will do this by examining the log odds of default within these groups to see whether the response variable is linear in the log odds, as logistic regression formally assumes. Perform the following steps to complete the exercise:

In the following code, reviewing the ...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Exercise: Finding Appropriate Features for Logistic Regression

Examining the log odds of default within groups