...

/

Nesterov Accelerated Gradient (NAG)

Nesterov Accelerated Gradient (NAG)

Learn how Nesterov Accelerated Gradient (NAG) can be used to escape local optimum in non-convex optimization.

What is NAG?

Consider the scenario where a company wants to determine the optimal production rate and the optimal selling price for one of its products to maximize profit, which is given by a non-convex objective having several local optimums.

NAG is a variant of gradient descent with momentum that improves the convergence rate and the stability of gradient descent. The main idea is to use a look-ahead term to calculate the gradient at a future point rather than the current point. This way, the algorithm can anticipate the direction of the optimal solution and avoid overshooting or oscillating. The figure below illustrates the idea:

Nesterov momentum
Nesterov momentum
NAG
NAG

The NAG update at a time tt is given as follows:

Here, vtv_t is the velocity vector that accumulates the gradient over time, β\beta is the momentum coefficient that controls how much of the previous velocity is retained, ...