Deal with Mislabeled and Imbalanced Machine Learning Datasets/

...

Unbiased Mislabeling in Image Classification Using CNNs

Explore how an unbiased mislabeled dataset affects the performance of a CNN model.

We'll cover the following...

Implementing unbiased mislabeling
Summary

In this lesson, we’ll learn about the impact of a small amount of unbiased mislabeling in a dataset. We aim to understand the consequences of poor-quality data by using a CNN model with two versions of the dataset—one with a clean dataset and the other with a mislabeled dataset. We’ll then compare the performance using the accuracy metric in order to gauge the impact of mislabeling.

Implementing unbiased mislabeling

To assess the impact of the dataset on the performance of a CNN model, we’ll take several steps to compare the results between a clean and mislabeled dataset.

Step 1: Importing libraries

The following code imports the libraries necessary to implement unbiased mislabeling:

Press + to interact

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Define the percentages for training and testing data
train_percentage = 0.25  # 15,000 images for training
test_percentage = 0.2   # 2,000 images for testing
# Calculate the number of samples based on percentages
total_train_samples = len(x_train) 
total_test_samples = len(x_test)
train_samples = int(train_percentage * total_train_samples)
test_samples = int(test_percentage * total_test_samples)
# Distribute the data based on percentages
x_train = x_train[:train_samples]
y_train = y_train[:train_samples]
x_test = x_test[:test_samples]
y_test = y_test[:test_samples]
# Define the percentage of mislabeled images
mislabel_percentage = 10
# Compute the number of images to mislabel
num_mislabeled = int(len(y_train) * mislabel_percentage / 100)
# Randomly select images to mislabel
index = np.random.choice(len(y_train), size=num_mislabeled, replace=False)
# Generate new labels for the mislabeled images
new_labels = np.random.randint(0, 9, size=num_mislabeled)
# Create a copy of the original training set and replace the selected images with the mislabeled ones
x_train_mislabeled = np.copy(x_train)
y_train_mislabeled = np.copy(y_train)
x_train_mislabeled[index] = x_train[index]
y_train_mislabeled[index] = new_labels

Introduction to the Course

Getting Started

Understanding Noisy Data, Label Noise, and Its Types

Introduction to Convolutional Neural Network (CNN)

Cats vs Dogs Classification with Convolutional Neural Networks

Performance Comparison of Mislabeled and Clean Dataset

Dealing with Imbalance Dataset

Gauge the Impact of Imbalanced and Mislabeled Datasets

Comprehensive Quiz

Wrap Up

Appendix

Dealing With Small Datasets In ML

Unbiased Mislabeling in Image Classification Using CNNs

Implementing unbiased mislabeling

Step 1: Importing libraries

Step 2: Loading and creating an unbiased mislabeled dataset