...

Imbalanced Datasets and Techniques to Handle Them

Learn about the class imbalance, how to deal with it, and an overview of the data to move further.

We'll cover the following...

BioAssay dataset

Class imbalance is a common problem in classification datasets, where the number of data points or observations is not the same across all the classes in the target column. The smaller differences are not a problem. However, there are cases when the dataset has an extreme class imbalance. For example:

Disease screening: We got the dataset to develop a machine learning model that can screen COVID-19 patients. We have only five COVID-19 positive cases in the dataset against 95 COVID-19 negative cases. Say we have 1,000 observations (100 positive and 900 negative cases).

Suppose we train our model on this COVID-19 dataset, and we are happy to see the ...

Course Introduction

Linear Regression

Regularization

Bias-Variance Trade-off

Categorical Features

Logistic Regression

Logistic Regression: Titanic Data

Sentiment Analysis Using Multinomial Logistic Regression

Multiclass Classification and Handling Imbalanced Classes

Project: Predicting Chronic Kidney Disease

K-Nearest Neighbors

Implementation of K-Nearest Neighbors

Logistic Regression vs. KNN

Decision Tree Learning

Implement the Decision Tree Classifier from Scratch

Bootstrapping and Confidence Interval

Support Vector Machine

Practice and Comparisons

What's Next?

Appendix

Imbalanced Datasets and Techniques to Handle Them