What is gradient clipping?

Gradient clipping and its needs

When we train models, we iterate over the training samples, make predictions about the training samples, and estimate the error between the predicated label and the real label. Next, we update the weights using the gradient of the error with respect to the weights. Usually, in deep models, we have to multiply a lot of terms in order to calculate the gradient. Two problems arise due to this approach.

Exploding gradient

Suppose, we have two vectors, both of which have all of the values greater than $1$ . Once we multiply them, each element will also be greater than $1$ . If we multiply the resultant with another vector that has all the values greater than $1$ , we'll again find the new resultant vector to have all values greater than $1$ . If we perform this operation multiple times, we'll eventually get to a point where the resultant vector will have values that are too large. This problem is called the exploding gradient problem.

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design

What is gradient clipping?

Gradient clipping and its needs

Exploding gradient

Vanishing gradient

Gradient clipping