The MNIST Database

Learn about the MNIST database, which we'll use as our primary datasets.

In developing the DBN model, we’ll use a dataset that we have discussed before—the MNIST database, which contains digital images of hand-drawn digitsLeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE. 86 (11): 2278–2324 from 00 to 99. This database is a combination of two sets of earlier images from the National Institute of Standards and Technology (NIST): Special Database 11 (digits written by US high school students) and Special Database 2LeCun, Yann; Corinna Cortes; Christopher J.C. Burges. MNIST handwritten digit database, Yann LeCun, Corinna Cortes, and Chris Burges (written by US Census Bureau employees), the sum of which is split into 60,000 training images and 10,000 test images.

The original images in the dataset were all black and white, while the modified dataset normalized them to fit into a 2020 x 2020 pixel bounding box and removed jagged edges using anti-aliasing. This led to intermediary grayscale values in cleaned images, which are padded for a final resolution of 2828 x 2828 pixels.

In the original NIST dataset, all the training images came from Bureau employees, while the test dataset came from high school students, and the modified version mixes the two groups in the training and test sets to provide a less biased population for training machine learning algorithms.

The figure below shows the digits from the NIST datasetNIST's original datasets: https://www.nist.gov/system/files/documents/ srd/nistsd19.pdf (left) and the MNIST datasetLeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE. 86 (11): 2278–2324 (right).

Get hands-on with 1400+ tech skills courses.