Exercise: Visualizing the Feature and Response Variable Relationship

Learn how to visualize the relationship between the features and response variable.

Relationship between features and response variable

In this exercise, you will further your knowledge of plotting functions from Matplotlib that you used earlier in this course. You’ll learn how to customize graphics to better answer specific questions with the data. As you pursue these analyses, you will create insightful visualizations of how the PAY_1 and LIMIT_BAL features relate to the response variable, which may possibly provide support for the hypotheses you formed about these features. This will be done by becoming more familiar with the Matplotlib Application Programming Interface (API), in other words, the syntax you use to interact with Matplotlib. Perform the following steps to complete the exercise:

  1. Calculate a baseline for the response variable of the default rate across the whole dataset using pandas’ mean():

    overall_default_rate = df['default payment next month'].mean()
    overall_default_rate 
    

    The output of this should be the following:

    # 0.2217971797179718 
    

    What would be a good way to visualize default rates for different values of the PAY_1 feature?

    Recall our observation that this feature is sort of like a hybrid categorical and numerical feature. We’ll choose to plot it in a way that is typical for categorical features, due to the relatively small number of unique values. In the chapter “Data Exploration and Cleaning,” we did value_counts of this feature as part of data exploration, then later we learned about groupby/mean when looking at the EDUCATION feature. groupby/mean would be a good way to visualize the default rate again here, for different payment statuses.

  2. Use this code to create a groupby/mean aggregation:

    group_by_pay_mean_y = df.groupby('PAY_1').agg( {'default payment next month':np.mean}) 
    group_by_pay_mean_y
    

    The output should look as follows:

Get hands-on with 1200+ tech skills courses.