In machine learning classification issues, cross-entropy loss is a frequently employed loss function. The difference between the projected probability distribution and the actual probability distribution of the target classes is measured by this metric.
The cross-entropy loss penalizes the model more when it is more confident in the incorrect class, which makes intuitive sense. The cross-entropy loss will be substantial – for instance, if the model forecasts a low probability for the right class but a high probability for the incorrect class.
For example, we can define cross-entropy loss like this:
loss(x, y) = - sum(y * log(x))
In this simple example, we have x
as the predicted probability distribution, y
is the true probability distribution (represented as a one-hot encoded vector), log
is the natural logarithm, and sum
is taken over all classes.
Cross-entropy loss, also known as log loss or softmax loss, is a commonly used loss function in PyTorch for training classification models. It measures the difference between the predicted class probabilities and the true class labels.
We first import the required libraries and create the input tensors:
import torchimport torch.nn.functional as TF# Define some sample input data and labelsinput_data = torch.randn(4, 10) # 4 samples, 10 classeslabels = torch.LongTensor([2, 5, 1, 9]) # target class indices
In PyTorch, the cross-entropy loss is implemented as the nn.CrossEntropyLoss
class. This class combines the nn.LogSoftmax
and nn.NLLLoss
functions to compute the loss in a numerically stable way. The nn.LogSoftmax
and nn.NLLLoss
functions are building blocks used to implement the cross-entropy loss function in PyTorch.
The PyTorch cross-entropy loss can be defined as:
loss_fn = nn.CrossEntropyLoss()loss = loss_fn(outputs, labels)
Here, outputs
is a tensor of predicted class probabilities of the batch_size
, num_classes
shape and labels
is a tensor of true class labels of the batch_size
shape.
The nn.CrossEntropyLoss
class applies a softmax function to the outputs
tensor to obtain the predicted class probabilities. After that, it computes the negative log-likelihood loss between the predicted probabilities and the true labels.
Let's implement all that we have learned:
import torchimport torch.nn.functional as TF# Define some sample input data and labelsinput_data = torch.randn(4, 10) # 4 samples, 10 classeslabels = torch.LongTensor([2, 5, 1, 9]) # target class indices# Compute the cross entropy lossloss = TF.cross_entropy(input_data, labels)# Print the computed lossprint(f"Cross entropy loss: {loss.item()}")# Compute the softmax probabilities manuallysoftmax_probs = TF.softmax(input_data, dim=1)# Print the computed softmax probabilitiesprint(f"Softmax probabilities:\n{softmax_probs}")# Compute the cross entropy loss manuallymanual_loss = torch.mean(-torch.log(softmax_probs.gather(1, labels.view(-1,1)) ))# Print the manually computed lossprint(f"Manually computed loss: {manual_loss.item()}")
Line 1: Firstly, import torch
library.
Line 2: We also import torch.nn.functional
with an alias TF
.
Line 5: We define some sample input data and labels with the input data having 4 samples and 10 classes.
Line 6: We create a tensor called labels
using the PyTorch library. The tensor is of type LongTensor
, which means that it contains integer values of 64-bit precision.
Line 9: The TF.cross_entropy()
function takes two arguments: input_data
and labels
. The input_data
argument is the predicted output of the model, which could be the output of the final layer before applying a softmax activation function. The labels
argument is the true label for the corresponding input data.
Line 12: We print the computed loss.
Line 15: We compute the softmax probabilities manually passing the input_data
and dim=1
which means that the function will apply the softmax function along the second dimension of the input_data
tensor.
Line 18: We also print the computed softmax probabilities.
Line 21: We compute the cross-entropy loss manually by taking the negative log of the softmax probabilities for the target class indices, averaging over all samples, and negating the result.
Line 24: Finally, we print the manually computed loss.
To summarize, cross-entropy loss is a popular loss function in deep learning and is very effective for classification tasks. While cross-entropy loss is a strong and useful tool for deep learning model training, it's crucial to remember that it is only one of many possible loss functions and might not be the ideal option for all tasks or datasets. Therefore, to identify the best settings for our unique use case, it is always a good idea to experiment with alternative loss functions and hyper-parameters.