Introduction to the Letter Classification Data Set
This lessons focuses on exploration and preprocessing of the letter classification dataset.
The letter classification dataset
The dataset consists of pixel values for generating A, B, and C along with their labels. This is a multiclass classification problem because we have to predict the probability of the letter A, B, or C, given the pixel configuration.
π Note: A multiclass classification problem requires the labels to be one-hot encoded.
π One-hot encoding is used to quantify categorical data, i.e., data having multiple categories. It generates a vector with the length equal to the number of categories in the data set. If a data point belongs to the category, then the indices of this vector are assigned the value 0 except for the index, which is assigned a value of 1. This helps track the categories in a numerically meaningful way.
Dataset exploration
Explore the dataset.
Get hands-on with 1300+ tech skills courses.