Home/Blog/Machine Learning/LeNet-5 — A complete guide
Home/Blog/Machine Learning/LeNet-5 — A complete guide

LeNet-5 — A complete guide

Saif Ali
Apr 29, 2024
11 min read

In 1998, Yann LeCun and team introduced LeNet-5, a groundbreaking convolutional neural network (CNN) developed at AT&T Labs. Its mission was to revolutionize handwritten character recognition, particularly in banking, where processing checks was a labor-intensive task. Thanks to LeNet-5, the accuracy of reading handwriting went way up, and the number of mistakes went way down. This big change had a huge impact on how banks worked, and it changed them forever. LeNet-5’s versatility extended across industries, automating tasks from address recognition to document digitization. Let’s look at its architecture and explore how this compact neural network reshaped the landscape of AI.

Understanding the evolution of neural networks, particularly milestones like LeNet-5, is essential for machine learning engineers and data scientists as it provides insight into the foundational principles of modern deep learning architectures. This historical context not only enriches one’s understanding of neural networks but also equips professionals with the knowledge to innovate and adapt these principles in their own work, fostering career advancement in the rapidly evolving field of machine learning.

LeNet architecture#

LeNet is characterized by a simple yet innovative architecture featuring a series of layers that collectively enable effective feature extraction and hierarchical learning.

The LeNet-5 architecture
The LeNet-5 architecture

The key components of the basic LeNet structure include:

  1. Input layer: LeNet typically takes grayscale images of fixed dimensions as input. In the case of the original LeNet-5, the input size is 3232 x 3232 pixels.

  2. Convolutional layers (C1 and C3): LeNet employs convolutional layers with learnable filters that automatically extract features from the input image. The convolutional operation involves sliding these filters across the input, capturing spatial hierarchies and patterns.

  3. Subsampling layers (S2 and S4): After convolution, LeNet introduces subsampling layers to downsample the spatial dimensions of the feature maps. This reduction aids in building translation invariance, making the network robust to slight shifts in the input.

  4. Fully connected layers (F5 and F6 ): In this layer, the extracted features are flattened and fed into dense layers for classification.

  5. Output layer: Using the softmax activation functionThe softmax activation function converts raw numbers into probabilities, ensuring they add up to 1. It's used in the output layer of neural networks for classification tasks to interpret predictions as probabilities., the output layer produces probabilities for each class.

LeNet’s impact on the development of modern CNNs#

LeNet’s influence on modern CNNs cannot be overstated. Its success in handwritten digit recognition showcased the potential of convolutional neural networks in practical applications. LeNet introduced fundamental concepts, such as convolutional and subsampling layers (pooling layers), that proved crucial for learning hierarchical representations of features.

The principles established by LeNet served as the foundation for subsequent advancements in deep learning and CNN architectures. Many modern convolutional neural networks, including popular ones like AlexNet, VGGNet, Inception (GoogLeNet), ResNet, EfficientNet, MobileNet, and YOLO, have drawn inspiration from LeNet’s principles, evolving and optimizing them for various image recognition tasks.

LeNet-5 paved the way for a new era in computer vision, influencing the design and development of neural networks that power state-of-the-art solutions in image processing, object detection, and beyond. Its legacy continues to resonate in the ongoing exploration and refinement of deep learning models.

Layers in LeNet architecture#

The architecture of LeNet comprises multiple layers, each playing a crucial role in the hierarchical feature extraction process. Let’s dive into the details of these layers:

1. Convolutional layer (C1)#

The convolutional layer (C1) serves as the initial building block in the LeNet architecture, playing a crucial role in extracting features from the input image.

In the convolutional operation, C1 utilizes learnable filters or kernels to scan the input image. These filters slide across the input, performing element-wise multiplications and aggregating the results. The output of this operation, known as a feature map, captures local patterns and features from the input. The convolutional operation is pivotal for learning hierarchical representations of visual information, allowing the network to recognize complex patterns.

canvasAnimation-image
1 of 2
Activation function and pooling layer#

Following the convolutional operation, an activation function is applied to introduce non-linearity into the network. In LeNet, the activation functions used in the early layers are typically sigmoidThe sigmoid activation function squeezes any input value to a range between 0 and 1, making it useful for binary classification tasks in neural networks. or tanhThe tanh (hyperbolic tangent) activation function squashes input values to a range between -1 and 1, making it useful for neural networks dealing with data that spans negative and positive values.. These functions squash the output of the convolutional operation into a bounded range, enabling the network to model complex relationships and capture non-linear patterns in the data.

After the convolutional operation, LeNet incorporates subsampling layers, often referred to as pooling layers. The purpose of subsampling is to downsample the spatial dimensions of the feature maps, reducing computational complexity and enhancing the network’s translational invariance. Max pooling is a commonly used technique in LeNet, where the maximum value in each local region is retained, effectively highlighting the most significant features while discarding less relevant information. The pooling layers contribute to the network’s ability to recognize patterns regardless of their precise spatial location, making LeNet more robust to variations in the input data.

canvasAnimation-image
1 of 2

2. Convolutional layer (C3)#

The convolutional layer (C3) in LeNet represents the second stage of feature extraction, building upon the foundation laid by the initial convolutional layer (C1).

The second convolutional layer (C3) serves as the successor to C1, extending the hierarchical learning process by capturing more complex and abstract features from the previously extracted information. The introduction of a second convolutional layer allows the network to discern higher-level patterns and relationships within the input data.

canvasAnimation-image
1 of 2
Activation function and pooling layer#

Similar to the first convolutional layer, C3 utilizes an activation function to introduce non-linearity into the network. Additionally, pooling is applied in C3 to downsample (pooling layer) the spatial dimensions of the feature maps.

canvasAnimation-image
1 of 2
Explanation of feature maps and filter sizes#

Feature maps in the context of C3 represent the output of the convolutional operation applied to the feature maps from the previous layer (typically the output of C1 or a similar layer). These feature maps capture complex combinations of features learned by the network and serve as input for subsequent layers.

The filter sizes in C3 determine the receptive field of the convolutional operation. Larger filter sizes allow the network to capture more global features, while smaller filter sizes focus on local patterns. The choice of filter sizes in C3 influences the network’s ability to recognize and generalize patterns in the input data.

3. Fully connected layer (F5)#

The fully connected layer (F5) marks a crucial transition in the LeNet architecture, representing the point where the hierarchical features extracted by the convolutional layers are flattened and fed into densely connected nodes.

After the convolutional layers (C1 and C3) have captured hierarchical features from the input, the fully connected layers introduce a transition to a more traditional neural network architecture. This transition enables the network to leverage the learned features for classification tasks.

Flattening of feature maps from previous layers#

Before entering the fully connected layers, the feature maps obtained from the convolutional layers are flattened. This process involves reshaping the multi-dimensional arrays into a one-dimensional vector. The flattened representation preserves the learned features' spatial hierarchy while transforming them into a format compatible with traditional neural network layers.

The first fully connected layer of LeNet architecture
The first fully connected layer of LeNet architecture

4. Fully connected layer (F6)#

The fully connected layer (F6) in LeNet represents another key component in the network’s architecture, following the initial transition from convolutional layers to fully connected layers.

Another fully connected layer in LeNet#

F6 continues the process of leveraging the hierarchical features extracted by the earlier layers for the final stages of classification. As a fully connected layer, F6 is densely connected to the nodes from the preceding layer, allowing it to capture complex relationships and dependencies in the learned features.

Activation function and output size#

Similar to the previous fully connected layer (F5), F6 applies an activation function to introduce non-linearity into the network. The choice of activation function, commonly tanh or sigmoid, is critical for regulating the information flow through the network and facilitating the learning process.

The output size of F6 depends on the number of nodes in this layer, which is typically determined based on the desired number of classes for the classification task. For example, in the case of handwritten digit recognition using the MNIST dataset, F6 might have 1010 nodes, each corresponding to one digit (00 through 99). The activation values of these nodes represent the network’s confidence or probability for each class.

The activation function in F6 helps in squashing the output values to a specific range, allowing the network to produce meaningful probabilities for each class. This probability distribution is then used to make predictions and classify the input data.

The second fully connected layer of LeNet architecture
The second fully connected layer of LeNet architecture

5. Output layer#

The output layer in LeNet is the final component of the network, responsible for producing the classification results based on the features processed through the preceding layers.

As the terminal stage of the LeNet architecture, the output layer takes the features extracted and processed by the preceding layers and translates them into a meaningful prediction. This layer is specifically designed for the classification task, providing probabilities or scores for each possible class.

Activation function#

The activation function employed in the output layer is typically the softmax function. Softmax is well-suited for multiclass classification problems, as it transforms the raw output scores into a probability distribution. This distribution ensures that the sum of probabilities for all classes is equal to one, allowing for a clear and interpretable output.

The softmax activation function takes the raw output values from the previous layer and normalizes them, converting them into probabilities. This normalization enables the network to express its confidence in each class, making it easier to interpret the final output.

Output interpretation and use of class labels#

The output from the output layer represents the network’s prediction for the input data. Each node in this layer corresponds to a specific class, and the associated probability indicates the network’s confidence in assigning the input to that class.

In the context of classification tasks, the class label with the highest probability is chosen as the predicted class. The use of class labels facilitates the interpretation of the model's predictions and enables users to understand which specific category the input data belongs to.

For instance, in the case of LeNet applied to the MNIST dataset for handwritten digit recognition, the Output Layer would have 10 nodes, each representing a digit from 0 to 9. The class label associated with the node having the highest probability would be the predicted digit.

The output layer of Lenet architecture
The output layer of Lenet architecture

Step-by-step implementation of LeNet-5#

Let’s dive into the detailed implementation of LeNet-5 for MNIST digit classification. By following each step carefully, we’ll gain a comprehensive understanding of how to build and train this powerful convolutional neural network. This walkthrough will equip us with practical insights into leveraging LeNet-5 for accurate digit recognition tasks, paving the way for broader applications in image processing and pattern recognition.

Importing libraries#

We start by importing necessary libraries such as NumPy, Matplotlib for visualization, scikit-learn for metrics, and TensorFlow for building and training the neural network model.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report
from tensorflow.keras import datasets, layers, models
from tensorflow.keras.utils import to_categorical

Loading and preprocessing the MNIST dataset#

We load the MNIST dataset using TensorFlow’s built-in datasets module. The dataset is split into training, validation, and testing sets. We preprocess the data by reshaping it into the required format and normalizing pixel values to fall within the range [0,1][0, 1].

# Load and preprocess the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
# Split the data into train, validation, and test sets
validation_split = 0.1
validation_size = int(len(train_images) * validation_split)
validation_images = train_images[:validation_size]
validation_labels = train_labels[:validation_size]
train_images = train_images[validation_size:]
train_labels = train_labels[validation_size:]
# Further preprocessing
train_images = train_images.reshape((len(train_images), 28, 28, 1)).astype('float32') / 255
validation_images = validation_images.reshape((len(validation_images), 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((len(test_images), 28, 28, 1)).astype('float32') / 255
train_labels = to_categorical(train_labels)
validation_labels = to_categorical(validation_labels)
test_labels = to_categorical(test_labels)

Building the LeNet-5 model#

The LeNet-5 model architecture is constructed using TensorFlow’s Keras API. It consists of convolutional layers with ReLU activationReLU (Rectified Linear Unit) activation function outputs the input value if it's positive, otherwise, it returns zero., max-pooling layers, and fully connected layers. The model is compiled with appropriate loss function, optimizer, and evaluation metric.

# Build LeNet-5 model
model = models.Sequential()
model.add(layers.Conv2D(6, (5, 5), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(16, (5, 5), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(120, activation='relu'))
model.add(layers.Dense(84, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Display model summary
model.summary()

The following is the LeNet-5 architecture built using the above code:

Training the model#

The constructed model is trained using the training data with a portion reserved for validation. The training process is visualized by plotting training and validation accuracy and loss over epochs.

# Train the model with validation data
history = model.fit(train_images, train_labels, epochs=10, batch_size=64, validation_data=(validation_images, validation_labels))
# Visualize training plot
plt.figure(figsize=(12, 8))
# Plot training accuracy values
plt.subplot(2, 2, 1)
plt.plot(history.history['accuracy'])
plt.title('Training Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
# Plot training loss values
plt.subplot(2, 2, 2)
plt.plot(history.history['loss'])
plt.title('Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
# Plot validation accuracy values
plt.subplot(2, 2, 3)
plt.plot(history.history['val_accuracy'])
plt.title('Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
# Plot validation loss values
plt.subplot(2, 2, 4)
plt.plot(history.history['val_loss'])
plt.title('Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.tight_layout()
plt.show()

The following is the output of the aforementioned code:

Testing the model#

The trained model is evaluated on the test dataset to assess its performance on unseen data. Test accuracy and loss are calculated, and a classification report is generated to analyze the model’s performance in detail.

# Testing the model on test data
test_loss, test_accuracy = model.evaluate(test_images, test_labels, verbose=2)
print(f"Test Accuracy: {test_accuracy}")
print(f"Test Loss: {test_loss}")
# Predicting labels for test images
predicted_labels = np.argmax(model.predict(test_images), axis=-1)
# Display classification report
print("Classification Report:\n", classification_report(np.argmax(test_labels, axis=-1), predicted_labels))

The following is the output of the aforementioned code:

Conclusion#

LeNet-5 revolutionized convolutional neural networks, particularly in handwritten digit recognition. Its hierarchical feature learning and efficient architecture paved the way for modern CNNs. Through step-by-step implementation, we showcased its practicality in MNIST digit classification, achieving high accuracy. LeNet-5’s legacy persists, shaping the landscape of deep learning and driving innovation in artificial intelligence.

Next steps#

If you want to expand your knowledge and learn more about CNNs, the following courses are an excellent starting point for you:

A Beginner's Guide to Deep Learning

Cover
A Beginner's Guide to Deep Learning

This beginner level and highly comprehensive course is intended for learners who are familiar with Python programming. You will become familiar with the fundamental concepts and terminologies used in deep learning. In addition, this course will help you understand the importance of deep learning techniques. You will examine simple models like perceptron before learning more complex yet powerful deep learning models. The course will provide hands-on practical knowledge of how to code simple and complex deep learning models in NumPy, a powerful Python library and Keras, a cutting-edge library for deep learning in Python. You can test your knowledge with the quizzes that are provided at the end of every lesson and coding challenges that will help you gain a higher understanding. By the end of the course, you should have a general understanding of the basics in deep learning and you will be equipped with the right tools to learn more advanced concepts.

20hrs
Beginner
13 Challenges
18 Quizzes

Introduction to Deep Learning & Neural Networks

Cover
Introduction to Deep Learning & Neural Networks

This course is an accumulation of well-grounded knowledge and experience in deep learning. It provides you with the basic concepts you need in order to start working with and training various machine learning models. You will cover both basic and intermediate concepts including but not limited to: convolutional neural networks, recurrent neural networks, generative adversarial networks as well as transformers. After completing this course, you will have a comprehensive understanding of the fundamental architectural components of deep learning. Whether you’re a data and computer scientist, computer and big data engineer, solution architect, or software engineer, you will benefit from this course.

4hrs 30mins
Intermediate
11 Challenges
8 Quizzes

Natural Language Processing with TensorFlow

Cover
Natural Language Processing with TensorFlow

Deep learning has revolutionized natural language processing (NLP) and NLP problems that require a large amount of work in terms of designing new features. Tuning models can now be efficiently solved using NLP. In this course, you will learn the fundamentals of TensorFlow and Keras, which is a Python-based interface for TensorFlow. Next, you will build embeddings and other vector representations, including the skip-gram model, continuous bag-of-words, and Global Vector representations. You will then learn about convolutional neural networks, recurrent neural networks, and long short-term memory networks. You’ll also learn to solve NLP tasks like named entity recognition, text generation, and machine translation using them. Lastly, you will learn transformer-based architectures and perform question answering (using BERT) and caption generation. By the end of this course, you will have a solid foundation in NLP and the skills to build TensorFlow-based solutions for a wide range of NLP problems.

15hrs
Intermediate
33 Playgrounds
10 Quizzes


  

Free Resources