...
/Simulating Unbiased Mislabeling Using Python Programming
Simulating Unbiased Mislabeling Using Python Programming
Learn about how to simulate unbiased mislabeling in the MNIST digit dataset using Python programming.
We'll cover the following...
The main objective of this lesson is to simulate unbiased mislabeling noise in a dataset and to visualize its impact. The lesson is structured into the following three steps:
Step 1: We’ll examine the MNIST digit dataset and analyze its characteristics in order to understand the dataset thoroughly before introducing mislabeling.
Step 2: We’ll simulate unbiased mislabeling in the MNIST dataset. By intentionally introducing mislabeled data points, we’ll simulate the effects of label noise on the dataset.
Step 3: We we’ll focus on creating visualizations that depict the impact of mislabeling on each digit within the MNIST dataset. These visualizations will help us observe the effect of unbiased mislabeling on the MNIST dataset.
Step 1: Visualizing the MNIST digit dataset
We chose the MNIST digit dataset, which contains 60,000 training images and 10,000 test images of handwritten digits, to observe the impact of unbiased mislabeling on image classification performance. The provided code visually represents the MNIST digit dataset using a bar chart. Each bar in the chart represents a digit instance, and the number of instances for each digit is displayed on top of the respective bar. Additionally, the digit labels are printed below the bar line. This visualization helps us understand the distribution and characteristics of the MNIST digit dataset.
Click the “Run” button to visualize the number of training examples for each digit in the MNIST dataset.
import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # Disable warningsfrom keras.datasets import mnist # Importing the MNIST digit datasetimport matplotlib.pyplot as plt # Importing the data visualization library# Loading the MNIST dataset(train_X, train_y), (test_X, test_y) = mnist.load_data()# Counting the number of instances of each digit in the training setdigit_counts = [0] * 10for i in train_y:digit_counts[i] += 1# Plotting the number of training examples of each digitfigure, axis = plt.subplots()bars = axis.bar(range(10), digit_counts)axis.set_xlabel("Digits")axis.set_ylabel("Counts")axis.set_title("Number of Training Examples for Each Digit in MNIST Dataset")# Adding the count labels to the barsfor bar, count in zip(bars, digit_counts):height = bar.get_height()axis.text(bar.get_x() + bar.get_width() / 2, height, count,ha='center', va='bottom')plt.show()
Line 3: We import the
mnist
dataset from the Keras library.Line 4: We import the
pyplot
module from thematplotlib
library, which is commonly used for data visualization.Line 7: We import the training and testing datasets using the
load data()
function.train_X
contains the training images of the MNIST dataset.train_y
...