Imbalanced Datasets and Techniques to Handle Them
Learn about the class imbalance, how to deal with it, and an overview of the data to move further.
We'll cover the following...
Class imbalance is a common problem in classification datasets, where the number of data points or observations is not the same across all the classes in the target column. The smaller differences are not a problem. However, there are cases when the dataset has an extreme class imbalance. For example:
Disease screening: We got the dataset to develop a machine learning model that can screen COVID-19 patients. We have only five COVID-19 positive cases in the dataset against 95 COVID-19 negative cases. Say we have 1,000 observations (100 positive and 900 negative cases).
Suppose we train our model on this COVID-19 dataset, and we are happy to see the ...