Newton's Method
Learn about one of the most famous second-order optimization algorithms.
We'll cover the following...
Second-order optimization methods use the second derivatives of a function in each iteration. That’s the only difference from first-order iterative methods. These methods require the target function to be not just differentiable but doubly differentiable. That is, the target function should have both its first and second derivative well defined.
Besides this stronger requirement, when applicable, second-order methods offer a greater convergence velocity than first-order methods. We’ll learn about the classic version of Newton’s method. This is probably the most famous second-order optimization algorithm and is the seed of many variants designed to overcome some of its limitations.
But first, let’s talk about second-order derivatives and their meaning.
Note: Newton’s method is also known as the Newton-Raphson method.
Interpretation of second-order derivatives
It’s time to give more importance to these derivatives. We’ve talked a little bit about the Hessians but we’ve mostly just ignored them. And that’s a little unfair, because second-order derivatives measure an important property of the functions.
In the same way that the first derivative and the gradient tell us about where a function is increasing or decreasing, the second derivative and the Hessian tell us about how the function is curved.
A univariate function with a positive second derivative is convex, and if the second derivative is negative, then the function is concave.
There’s a saddle point where the curvature changes. Newton’s method could get stuck at that point. Also, the first derivative is zero at these saddle points and yet they’re not a minimum or a maximum.
When the second derivative is zero at a point, that’s a saddle point. We already know saddle points, where the first derivative is zero but they’re not minima nor maxima. Well, the problem with these points is that the curvature of the function changes in them. The function goes from convex to concave or vice versa. Thus, the second derivative changes its sign.
Analogously for multidimensional functions, if the Hessian is a definite positive (the test we did to check for minima in the previous section), then the function is convex at that point; if the Hessian is a definite negative (the test we did to check for maxima in the previous ...