The Response Variable and Concluding the Initial Exploration
Learn about binary classification and how to prepare data for it.
We'll cover the following...
We have now looked through all the features to see whether any data is missing, as well as to examine them generally. The features are important because they constitute the inputs to our machine learning algorithm. On the other side of the model lies the output, which is a prediction of the response variable. For our problem, this is a binary flag indicating whether or not a credit account will default next month.
Binary classification and proportions of classes
The key task for the case study project is to come up with a predictive model for this target. Because the response variable is a yes/no flag, this problem is called a binary classification task. In our labeled data, the samples (accounts) that defaulted (that is, 'default payment next month' = 1) are said to belong to the positive class, while those that didn’t belong to the negative class.
The main piece of information to examine regarding the response of a binary classification problem is this: what is the proportion of the positive class? This is an easy check.
Before we perform this check, we load the packages we need with the following code:
import numpy as np #numerical computation
import pandas as pd #data wrangling
import