...

/

Simulating Unbiased Mislabeling Using Python Programming

Simulating Unbiased Mislabeling Using Python Programming

Learn about how to simulate unbiased mislabeling in the MNIST digit dataset using Python programming.

The main objective of this lesson is to simulate unbiased mislabeling noise in a dataset and to visualize its impact. The lesson is structured into the following three steps:

  • Step 1: We’ll examine the MNIST digit dataset and analyze its characteristics in order to understand the dataset thoroughly before introducing mislabeling.

  • Step 2: We’ll simulate unbiased mislabeling in the MNIST dataset. By intentionally introducing mislabeled data points, we’ll simulate the effects of label noise on the dataset.

  • Step 3: We we’ll focus on creating visualizations that depict the impact of mislabeling on each digit within the MNIST dataset. These visualizations will help us observe the effect of unbiased mislabeling on the MNIST dataset.

Step 1: Visualizing the MNIST digit dataset

We chose the MNIST digit dataset, which contains 60,000 training images and 10,000 test images of handwritten digits, to observe the impact of unbiased mislabeling on image classification performance. The provided code visually represents the MNIST digit dataset using a bar chart. Each bar in the chart represents a digit instance, and the number of instances for each digit is displayed on top of the respective bar. Additionally, the digit labels are printed below the bar line. This visualization helps us understand the distribution and characteristics of the MNIST digit dataset.

Click the “Run” button to visualize the number of training examples for each digit in the MNIST dataset.

Press + to interact
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # Disable warnings
from keras.datasets import mnist # Importing the MNIST digit dataset
import matplotlib.pyplot as plt # Importing the data visualization library
# Loading the MNIST dataset
(train_X, train_y), (test_X, test_y) = mnist.load_data()
# Counting the number of instances of each digit in the training set
digit_counts = [0] * 10
for i in train_y:
digit_counts[i] += 1
# Plotting the number of training examples of each digit
figure, axis = plt.subplots()
bars = axis.bar(range(10), digit_counts)
axis.set_xlabel("Digits")
axis.set_ylabel("Counts")
axis.set_title("Number of Training Examples for Each Digit in MNIST Dataset")
# Adding the count labels to the bars
for bar, count in zip(bars, digit_counts):
height = bar.get_height()
axis.text(bar.get_x() + bar.get_width() / 2, height, count,
ha='center', va='bottom')
plt.show()

  • Line 3: We import the mnist dataset from the Keras library.

  • Line 4: We import the pyplot module from the matplotlib library, which is commonly used for data visualization.

  • Line 7: We import the training and testing datasets using the load data() function.

    • train_X contains the training images of the MNIST dataset.

    • train_y ...