What is cross-entropy loss in PyTorch?

In machine learning classification issues, cross-entropy loss is a frequently employed loss function. The difference between the projected probability distribution and the actual probability distribution of the target classes is measured by this metric.

The cross-entropy loss penalizes the model more when it is more confident in the incorrect class, which makes intuitive sense. The cross-entropy loss will be substantial – for instance, if the model forecasts a low probability for the right class but a high probability for the incorrect class.

For example, we can define cross-entropy loss like this:

loss(x, y) = - sum(y * log(x))

In this simple example, we have x as the predicted probability distribution, y is the true probability distribution (represented as a one-hot encoded vector), log is the natural logarithm, and sum is taken over all classes.

Cross-entropy loss in PyTorch

Cross-entropy loss, also known as log loss or softmax loss, is a commonly used loss function in PyTorch for training classification models. It measures the difference between the predicted class probabilities and the true class labels.

  1. We first import the required libraries and create the input tensors:

import torch
import torch.nn.functional as TF
# Define some sample input data and labels
input_data = torch.randn(4, 10) # 4 samples, 10 classes
labels = torch.LongTensor([2, 5, 1, 9]) # target class indices
  1. In PyTorch, the cross-entropy loss is implemented as the nn.CrossEntropyLoss class. This class combines the nn.LogSoftmax and nn.NLLLoss functions to compute the loss in a numerically stable way. The nn.LogSoftmax and nn.NLLLoss functions are building blocks used to implement the cross-entropy loss function in PyTorch.

The PyTorch cross-entropy loss can be defined as:

loss_fn = nn.CrossEntropyLoss()
loss = loss_fn(outputs, labels)

Here, outputs is a tensor of predicted class probabilities of the batch_size, num_classes shape and labels is a tensor of true class labels of the batch_size shape.

  1. The nn.CrossEntropyLoss class applies a softmax function to the outputs tensor to obtain the predicted class probabilities. After that, it computes the negative log-likelihood loss between the predicted probabilities and the true labels.

Example

Let's implement all that we have learned:

import torch
import torch.nn.functional as TF
# Define some sample input data and labels
input_data = torch.randn(4, 10) # 4 samples, 10 classes
labels = torch.LongTensor([2, 5, 1, 9]) # target class indices
# Compute the cross entropy loss
loss = TF.cross_entropy(input_data, labels)
# Print the computed loss
print(f"Cross entropy loss: {loss.item()}")
# Compute the softmax probabilities manually
softmax_probs = TF.softmax(input_data, dim=1)
# Print the computed softmax probabilities
print(f"Softmax probabilities:\n{softmax_probs}")
# Compute the cross entropy loss manually
manual_loss = torch.mean(-torch.log(softmax_probs.gather(1, labels.view(-1,1)) ))
# Print the manually computed loss
print(f"Manually computed loss: {manual_loss.item()}")

Explanation

  • Line 1: Firstly, import torch library.

  • Line 2: We also import torch.nn.functional with an alias TF .

  • Line 5: We define some sample input data and labels with the input data having 4 samples and 10 classes.

  • Line 6: We create a tensor called labels using the PyTorch library. The tensor is of type LongTensor, which means that it contains integer values of 64-bit precision.

  • Line 9: The TF.cross_entropy() function takes two arguments: input_data and labels. The input_data argument is the predicted output of the model, which could be the output of the final layer before applying a softmax activation function. The labels argument is the true label for the corresponding input data.

  • Line 12: We print the computed loss.

  • Line 15: We compute the softmax probabilities manually passing the input_data and dim=1 which means that the function will apply the softmax function along the second dimension of the input_data tensor.

  • Line 18: We also print the computed softmax probabilities.

  • Line 21: We compute the cross-entropy loss manually by taking the negative log of the softmax probabilities for the target class indices, averaging over all samples, and negating the result.

  • Line 24: Finally, we print the manually computed loss.

Conclusion

To summarize, cross-entropy loss is a popular loss function in deep learning and is very effective for classification tasks. While cross-entropy loss is a strong and useful tool for deep learning model training, it's crucial to remember that it is only one of many possible loss functions and might not be the ideal option for all tasks or datasets. Therefore, to identify the best settings for our unique use case, it is always a good idea to experiment with alternative loss functions and hyper-parameters.