What is the dying ReLU problem?

A dying ReLU always outputs the same value, i.e., 0, on any input value. This condition is known as the dead state of ReLU neurons. In this state, it is difficult to recover because the gradient of 0 is 0. This becomes a problem when most of the input ranges are negative, or the derivative of the ReLU function is 0.

The gradient fails to flow during backpropagation because the outputs are 0, and hence the weights are not updated. In the worst case, we get a constant function where the entire neural network dies. A network is born dead if it is dead before training. As long as all the inputs push ReLU to non-negative segments, the dying ReLU problem doesn't occur.

Causes of the dying ReLU

There are two major causes of the dying ReLU problem:

Setting high learning rates
Having a large negative bias

Let's discuss them in detail.

High learning rates

In neural networks, the weights are updated using the following equation:

If the alpha is too high, when setting the learning rate, the new weights can have a negative range. This is because subtracting a larger value from a smaller one results in a negative value. These negative values become the new inputs to the ReLU and cause the dying ReLU problem.

Large negative bias

In neural networks, a biased term is also passed in the activation function. A large negative biased returns ReLU activation inputs as negative and causes 0 output, resulting in a dying ReLU problem.

Recovering from dying ReLU

Different techniques are used to solve the dying ReLU problem.

Lowering the learning rates and negative bias

Lowering the learning rates and using a positive bias can mitigate the chance of dying ReLU. This pushes the ReLU activation inputs to the non-negative side. This technique helps activate the neurons with the flow of gradient.

Using Leaky ReLU

Another popular technique is Leaky ReLU, as it solves the vanishing gradient problem and then converges fast. In the entire domain, Leaky ReLU has a non-zero gradient. The slope is non-zero for the negative side in Leaky ReLU, which is not the case for general ReLU. Hence, we have a small negative output for the negative input, which helps recover from the dying ReLU.

Some other techniques include the Parametric ReLU and exponential linear units.

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

What is the dying ReLU problem?