Newton’s Method

Learn how to use the second-order information, such as Hessian, for gradient descent.

The second-order optimization algorithms

Newton’s methods are a class of optimization algorithms that leverage second-order information, such as Hessians, to achieve faster and more efficient convergence. In contrast, the gradient descent algorithms, like the Nesterov momentum, depend solely on the first-order gradient information.

The idea of Newton’s method is to utilize the curvature information present in Hessians to get a more accurate approximation of the function near the optimum.

Recall the two-degree Taylor series expansion of our objective f(x)f(x) around a point xtx_t (at the time tt)​ as follows:

Get hands-on with 1200+ tech skills courses.