A neural network is an AI methodology that empowers computers and devices to handle data in a way that draws inspiration from the human brain. It consists of interconnected nodes or neurons arranged in layers that process and transmit information, much like the configuration of the human brain. Neural networks are designed to recognize complex patterns and relationships in data. Through this layered structure, they create an adaptive system that enables computers to learn from mistakes and improve their performance gradually.
Think of a neuron as a tiny decision-maker. It takes input, processes it, and produces an output. In a neural network, we have artificial neurons, as seen in the above animation.
A neural network is organized into layers. The input layer receives data, the hidden layers process it, and the output layer provides the final result. Each layer has many neurons.
Neurons have parameters called weights and bias. Weights adjust the importance of each input, and bias helps shift the activation function. These are crucial for the network to learn effectively.
The first step of building a neural network using PyTorch is to import the torch
library, as shown below:
import torchimport torch.nn as nn
Then, we define a class that represents our neural network— SimpleNN
in our case. This class acts as a blueprint for the network we want to create. Next, we set up the initial configuration of our neural network by specifying the number of input features (input_size
), hidden neurons (hidden_size
), and output neurons (output_size
). This is done in the __init__
method.
Inside the SimpleNN
class, we create the building blocks of our neural network, i.e., the input layer, the activation function (ReLU), and the output layer.
class SimpleNN(nn.Module):def __init__(self, input_size, hidden_size, output_size):super(SimpleNN, self).__init__()self.fc1 = nn.Linear(input_size, hidden_size)self.relu = nn.ReLU()self.fc2 = nn.Linear(hidden_size, output_size)
Imagine our neural network as a series of connected parts.
self.fc1
represents the first part, a linear layer that takes the input data and transforms it using weights and biases to produce some intermediate values.
self.relu
represents the activation function. It is a filter that removes the negative values and keeps the positive ones.
self.fc2
is the second linear layer that takes the filtered data from the activation function and transforms it to produce the final output.
Finally, we need to specify how data flows through the network. This is done by defining the forward pass in the neural network class, as shown below:
def forward(self, x):out = self.fc1(x)out = self.relu(out)out = self.fc2(out)return out
We connect the different parts. We first pass the x
input through self.fc1
, then through the activation function, self.relu
, and finally through self.fc2
. Each step transforms the data. We return the final result, which is the output of our network, after passing the input through all these layers.
Training a neural network involves setting it up with the right structure, defining how to measure its mistakes (loss), adjusting its parameters to minimize those mistakes (optimization), and iteratively improving its predictions through backpropagation. PyTorch provides a powerful and accessible framework to accomplish these steps and build intelligent systems.
We need to know how wrong our predictions are. A loss function measures the difference between the predicted output and the actual label. Common loss functions include the mean squared error (MSE) or cross-entropy loss. We select an appropriate loss function based on our task.
To minimize loss, we use an optimization algorithm like gradient descent. It adjusts the weights and biases in a way that reduces the loss, step by step.
This is a crucial step. When we calculate the loss, we propagate this information backward through the network, adjusting the weights and biases accordingly. It is like learning from mistakes and getting better at predicting.
Let’s see the implementation of how to train a neural network using PyTorch.
import torchimport torch.nn as nn# Define the neural network architectureclass SimpleNN(nn.Module):def __init__(self, input_size, hidden_size, output_size):super(SimpleNN, self).__init__()self.fc1 = nn.Linear(input_size, hidden_size) # Fully connected layer 1self.relu = nn.ReLU() # ReLU activation functionself.fc2 = nn.Linear(hidden_size, output_size) # Fully connected layer 2def forward(self, x):out = self.fc1(x) # Apply the first fully connected layerout = self.relu(out) # Apply the ReLU activation functionout = self.fc2(out) # Apply the second fully connected layerreturn out# Define network hyperparametersinput_size = 64 # Number of input featureshidden_size = 128 # Number of neurons in the hidden layeroutput_size = 10 # Number of output classes# Input datainput_data = torch.rand(32, input_size) # 32 is the batch sizetarget = torch.empty(32, dtype=torch.long).random_(output_size)# Create an instance of the SimpleNN modelmodel = SimpleNN(input_size, hidden_size, output_size)# Define the loss function (Cross Entropy Loss) and optimizer (Adam)criterion = nn.CrossEntropyLoss()optimizer = torch.optim.Adam(model.parameters(), lr=0.01)# Example training loopnum_epochs = 10 # Define the number of training epochsfor epoch in range(num_epochs):# Forward passoutputs = model(input_data)loss = criterion(outputs, target) # Compute the loss# Backward pass and optimizationoptimizer.zero_grad() # Clear gradientsloss.backward() # Backpropagate to compute gradientsoptimizer.step() # Update the model parameters# Print the loss for each epochprint(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item()}')
Lines 28–32: We create an instance of the SimpleNN
model with the specified input, hidden, and output sizes. Then, we define the loss function, CrossEntropyLoss()
, to calculate the loss between the actual and the predicted label and the Adam
optimizer to minimize the loss.
Lines 35–39: We set up a training loop to train the neural network for a specified number of epochs, num_epochs
. We pass the input data inside the loop through the model to obtain predictions. The loss is calculated by comparing the model's predictions with the target values.
Line 42: We clear the gradients of the model’s parameters that are stored in the optimizer. The optimizer keeps track of the gradients for each parameter and calls the zero_grad()
function to set these gradients to zero for the current iteration.
Note: During the backward pass, if we don’t clear these gradients before the next backward pass, the new gradients are added to the existing ones. This can lead to incorrect gradient information and make the optimization process ineffective or unstable.
Line 43: We compute the gradients of the loss with respect to the model’s parameters using backpropagation. These gradients are accumulated in the model’s parameters, which are then used for the next step.
Line 44: We update the model’s weights in the direction that minimizes the loss.
Line 47: We print the loss for each epoch to monitor the training progress.
What is the primary function of the torch.nn.Module
class in PyTorch?
Data preprocessing
Visualization of neural networks
Defining and managing neural network layers
Loading pretrained models