What is exploding gradient problem?

Neural networks are used in several applications like computer vision, natural language processing, speech processing, and so on. The performance of neural networks has improved considerably over the past decade. Only a well-trained neural network can perform well on the test dataset. Many beginners face problems in the training process of the neural network.

In gradient-based learning algorithms, we use gradients to learn the weights of a neural network. It works like a chain reaction as the gradients closer to the output layers are multiplied with the gradients of the layers closer to the input layers. These gradients are used to update the weights.

If the gradients are large, the multiplication of these gradients will become huge over time. This results in the model being unable to learn and its behavior becomes unstable. This problem is called the exploding gradient problem.

Example

Let's consider an overly simplified example. Suppose we have a 20-layer neural network, and each layer has only one neuron in each layer.

20-Layer Neural Network

Here, xx is our 1-dimensional input to the neural network. It generates y^\hat{y} as the predicted output. The weights are represented by wiw_i. Each layer has ReLU as the activation function aia_i.

The output of the first neuron is calculated as the following:

This output will be fed as an input to the neuron in the second layer, and so on.

As we can observe, there is chaining going on with the outputs of the neurons. If we calculate the output with respect to the very first weight, it would be tedious and hard to keep track of.

When the gradient with respect to weight is high, there would be a greater change in the weight. For the sake of understanding, let's say that the value of all the weights is the same (2.342.34), and the input value is 11. Then the resulting value would be this:

This is a huge value. Imagine how this would scale for a 50-layered network.

How to detect an exploding gradient

There are some key rules that can help identify whether or not the gradient is exploding. These are as follows:

  • The model is not performing well on the training data.

  • There are large changes in the learning loss (unstable learning).

  • The loss becomes NaN.

Solution

Some of the suggested solutions to tackle the exploding gradient problem are given below:

  • Use batch normalizationBatch normalization allows us to normalize the output of a layer in a specific range. This is usually between -1 and 1.

  • Use less number of layers

  • Carefully initialize weights

  • Use gradient clipping

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved