Nesterov Accelerated Gradient (NAG)
Understand the Nesterov Accelerated Gradient (NAG) technique used in advanced gradient descent methods. Learn how NAG anticipates gradients ahead of the current position to improve convergence speed and avoid local optima, using practical implementation examples with the Rosenbrock function.
We'll cover the following...
What is NAG?
Consider the scenario where a company wants to determine the optimal production rate and the optimal selling price for one of its products to maximize profit, which is given by a non-convex objective having several local optimums.
NAG is a variant of gradient descent with momentum that improves the convergence rate and the stability of gradient descent. The main idea is to use a look-ahead term to calculate the gradient at a future point rather than the current point. This way, the algorithm can anticipate the direction of the optimal solution and avoid overshooting or oscillating. The figure below illustrates the idea:
The NAG update at a time
Here,