Exercise: Finding Appropriate Features for Logistic Regression
Learn how to examine the appropriateness of features for logistic regression.
We'll cover the following
In the Visualizing Features and Response Variable Relationship exercise, we plotted a groupby
/mean
of what might be the most important feature of the model, according to our exploration so far: the PAY_1
feature. By grouping samples by the values of PAY_1
, and then looking at the mean of the response variable, we are effectively looking at the probability, p, of default within each of these groups.
Examining the log odds of default within groups
In this exercise, we will evaluate the appropriateness of PAY_1
for logistic regression. We will do this by examining the log odds of default within these groups to see whether the response variable is linear in the log odds, as logistic regression formally assumes. Perform the following steps to complete the exercise:
-
In the following code, reviewing the DataFrame of the average value of the response variable for different values of
PAY_1
with this code:group_by_pay_mean_y = df.groupby('PAY_1').agg({'default payment next month':np.mean}) group_by_pay_mean_y
The output should be as follows:
Get hands-on with 1300+ tech skills courses.