Optimization for Machine Learning with NumPy and SciPy/

...

Nesterov Accelerated Gradient (NAG)

Learn how Nesterov Accelerated Gradient (NAG) can be used to escape local optimum in non-convex optimization.

We'll cover the following...

What is NAG?

Consider the scenario where a company wants to determine the optimal production rate and the optimal selling price for one of its products to maximize profit, which is given by a non-convex objective having several local optimums.

NAG is a variant of gradient descent with momentum that improves the convergence rate and the stability of gradient descent. The main idea is to use a look-ahead term to calculate the gradient at a future point rather than the current point. This way, the algorithm can anticipate the direction of the optimal solution and avoid overshooting or oscillating. The figure below illustrates the idea: