Exercise: Finding Appropriate Features for Logistic Regression

Learn how to examine the appropriateness of features for logistic regression.

In the Visualizing Features and Response Variable Relationship exercise, we plotted a groupby/mean of what might be the most important feature of the model, according to our exploration so far: the PAY_1 feature. By grouping samples by the values of PAY_1, and then looking at the mean of the response variable, we are effectively looking at the probability, p, of default within each of these groups.

Examining the log odds of default within groups

In this exercise, we will evaluate the appropriateness of PAY_1 for logistic regression. We will do this by examining the log odds of default within these groups to see whether the response variable is linear in the log odds, as logistic regression formally assumes. Perform the following steps to complete the exercise:

  1. In the following code, reviewing the DataFrame of the average value of the response variable for different values of PAY_1 with this code:

    group_by_pay_mean_y = df.groupby('PAY_1').agg({'default payment next month':np.mean})
    group_by_pay_mean_y
    

    The output should be as follows:

Get hands-on with 1400+ tech skills courses.