What is the vanishing gradient problem?

In gradient-based learning algorithms, we use gradients to learn the weights of a neural network. It works like a chain reaction as the gradients closer to the output layers are multiplied with the gradients of the layers closer to the input layers. These gradients are used to update the weights of the neural network.

If the gradients are small, the multiplication of these gradients will become so small that it will be close to zero. This results in the model being unable to learn, and its behavior becomes unstable. This problem is called the vanishing gradient problem.

Example

Here, we'll take an overly simplified example to understand this vanishing effect. Suppose we have a 20-layer neural network, and each layer has only one neuron in each layer.

This is a very small value, and if there were $50$ layers in the neural network, this would be even smaller.

Causes of vanishing gradients

The effect of the small gradient value is smaller and smaller changes in weights, and eventually, the neural network stops training at all. But we would like to know why it gets small in the first place.

The vanishing gradient is a very common problem caused by the use of the sigmoid function. Even though sigmoid is used a lot, it is prone to the vanishing gradient problem. Especially when the depth of the neural network is increasing. The mathematical formula of the sigmoid function is given below:

Solutions

Some common ways to counter the vanishing gradient are as follows:

Use residual blocks: ResNets implement residual blocks, which is very effective in countering the vanishing gradient problem.
Use a different activation function: As shown in the figure above, some activation functions are better than sigmoid. To learn more, refer to this link.
Use careful weight initialization: Normally, we randomly initialize the weights for a neural network. Some weight initialization techniques like initialization and Xavier initialization can ensure that the weights remain close to $1$ .

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design