Solution: Undersampling
Learn how to solve the exercise posed in the previous lesson.
We'll cover the following
Let’s get more familiar with the NearMiss undersampling strategy by practice.
Task
Here, we deal with severely imbalanced training data for a binary classification problem. By default, NearMiss will balance out the training data so that we have (roughly) a 1:1 ratio of classes. We want you to apply a slightly different sampling strategy to end up with a 1:10 ratio.
Configure NearMiss version 3 and choose parameters to meet a 1:10 imbalance ratio after undersampling the majority class.
Apply the sampling strategy on
X_train
andy_train
to create undersampled training data.Verify that the undersampled data meets the 1:10 imbalance ratio.
Coding workspace
The following workspace has the code solution for the task mentioned above:
Get hands-on with 1400+ tech skills courses.