Training a neural network using PyTorch

A neural network is an AI methodology that empowers computers and devices to handle data in a way that draws inspiration from the human brain. It consists of interconnected nodes or neurons arranged in layers that process and transmit information, much like the configuration of the human brain. Neural networks are designed to recognize complex patterns and relationships in data. Through this layered structure, they create an adaptive system that enables computers to learn from mistakes and improve their performance gradually.

Neurons

Think of a neuron as a tiny decision-maker. It takes input, processes it, and produces an output. In a neural network, we have artificial neurons, as seen in the above animation.

Layers

A neural network is organized into layers. The input layer receives data, the hidden layers process it, and the output layer provides the final result. Each layer has many neurons.

Weights and bias

Neurons have parameters called weights and bias. Weights adjust the importance of each input, and bias helps shift the activation function. These are crucial for the network to learn effectively.

Building a neural network

The first step of building a neural network using PyTorch is to import the torch library, as shown below:

import torch
import torch.nn as nn
Importing the libraries

Then, we define a class that represents our neural network— SimpleNN in our case. This class acts as a blueprint for the network we want to create. Next, we set up the initial configuration of our neural network by specifying the number of input features (input_size), hidden neurons (hidden_size), and output neurons (output_size). This is done in the __init__ method.

Inside the SimpleNN class, we create the building blocks of our neural network, i.e., the input layer, the activation function (ReLU), and the output layer.

class SimpleNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, output_size)
Defining a neural network

Imagine our neural network as a series of connected parts.

  • self.fc1 represents the first part, a linear layer that takes the input data and transforms it using weights and biases to produce some intermediate values.

  • self.relu represents the activation function. It is a filter that removes the negative values and keeps the positive ones.

  • self.fc2 is the second linear layer that takes the filtered data from the activation function and transforms it to produce the final output.

Finally, we need to specify how data flows through the network. This is done by defining the forward pass in the neural network class, as shown below:

def forward(self, x):
out = self.fc1(x)
out = self.relu(out)
out = self.fc2(out)
return out
Forward pass

We connect the different parts. We first pass the x input through self.fc1, then through the activation function, self.relu, and finally through self.fc2. Each step transforms the data. We return the final result, which is the output of our network, after passing the input through all these layers.

Training a neural network

Training a neural network involves setting it up with the right structure, defining how to measure its mistakes (loss), adjusting its parameters to minimize those mistakes (optimization), and iteratively improving its predictions through backpropagation. PyTorch provides a powerful and accessible framework to accomplish these steps and build intelligent systems.

Loss function

We need to know how wrong our predictions are. A loss function measures the difference between the predicted output and the actual label. Common loss functions include the mean squared error (MSE) or cross-entropy loss. We select an appropriate loss function based on our task.

Optimization algorithm

To minimize loss, we use an optimization algorithm like gradient descent. It adjusts the weights and biases in a way that reduces the loss, step by step.

Backpropagation

This is a crucial step. When we calculate the loss, we propagate this information backward through the network, adjusting the weights and biases accordingly. It is like learning from mistakes and getting better at predicting.

Implementation using PyTorch

Let’s see the implementation of how to train a neural network using PyTorch.

import torch
import torch.nn as nn
# Define the neural network architecture
class SimpleNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size) # Fully connected layer 1
self.relu = nn.ReLU() # ReLU activation function
self.fc2 = nn.Linear(hidden_size, output_size) # Fully connected layer 2
def forward(self, x):
out = self.fc1(x) # Apply the first fully connected layer
out = self.relu(out) # Apply the ReLU activation function
out = self.fc2(out) # Apply the second fully connected layer
return out
# Define network hyperparameters
input_size = 64 # Number of input features
hidden_size = 128 # Number of neurons in the hidden layer
output_size = 10 # Number of output classes
# Input data
input_data = torch.rand(32, input_size) # 32 is the batch size
target = torch.empty(32, dtype=torch.long).random_(output_size)
# Create an instance of the SimpleNN model
model = SimpleNN(input_size, hidden_size, output_size)
# Define the loss function (Cross Entropy Loss) and optimizer (Adam)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# Example training loop
num_epochs = 10 # Define the number of training epochs
for epoch in range(num_epochs):
# Forward pass
outputs = model(input_data)
loss = criterion(outputs, target) # Compute the loss
# Backward pass and optimization
optimizer.zero_grad() # Clear gradients
loss.backward() # Backpropagate to compute gradients
optimizer.step() # Update the model parameters
# Print the loss for each epoch
print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item()}')

Code explanation

  • Lines 28–32: We create an instance of the SimpleNN model with the specified input, hidden, and output sizes. Then, we define the loss function, CrossEntropyLoss(), to calculate the loss between the actual and the predicted label and the Adam optimizer to minimize the loss.

  • Lines 35–39: We set up a training loop to train the neural network for a specified number of epochs, num_epochs. We pass the input data inside the loop through the model to obtain predictions. The loss is calculated by comparing the model's predictions with the target values.

  • Line 42: We clear the gradients of the model’s parameters that are stored in the optimizer. The optimizer keeps track of the gradients for each parameter and calls the zero_grad() function to set these gradients to zero for the current iteration.

Note: During the backward pass, if we don’t clear these gradients before the next backward pass, the new gradients are added to the existing ones. This can lead to incorrect gradient information and make the optimization process ineffective or unstable.

  • Line 43: We compute the gradients of the loss with respect to the model’s parameters using backpropagation. These gradients are accumulated in the model’s parameters, which are then used for the next step.

  • Line 44: We update the model’s weights in the direction that minimizes the loss.

  • Line 47: We print the loss for each epoch to monitor the training progress.

1

What is the primary function of the torch.nn.Module class in PyTorch?

A)

Data preprocessing

B)

Visualization of neural networks

C)

Defining and managing neural network layers

D)

Loading pretrained models

Question 1 of 20 attempted

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved