Exercise: Undersampling
Learn how to reduce the imbalance to a fixed ratio with NearMiss.
We'll cover the following
Let’s get more familiar with the NearMiss undersampling strategy by practice.
Task
Here, we deal with severely imbalanced training data for a binary classification problem. By default, NearMiss will balance out the training data so that we have (roughly) a 1:1 ratio of classes. We want you to apply a slightly different sampling strategy to end up with a 1:10 ratio.
Configure NearMiss version 3 and choose parameters to meet a 1:10 imbalance ratio after undersampling the majority class.
Apply the sampling strategy on
X_train
andy_train
to create undersampled training data.Verify that the undersampled data meets the 1:10 imbalance ratio.
Coding workspace
The X_train
and y_train
training data is available in memory in the workspace. Let’s try to code the solution.
Get hands-on with 1400+ tech skills courses.