Optimization for Machine Learning with NumPy and SciPy/

...

Adaptive Moment Estimation (Adam)

Learn how the Adaptive Moment Estimation (Adam) algorithm dynamically adapts the learning rate as well as escapes the local optimums at the same time.

We'll cover the following...

How does Adam work?
Implementation of the Adam algorithm

Adaptive Moment Estimation (Adam) is an optimization algorithm designed to address some of the shortcomings of gradient descent in training deep neural networks. The algorithm dynamically adapts the learning rate as well as escapes the local optimums at the same time.

While the Nesterov momentum only helps in escaping the local optimums and RMSProp only helps in dynamically adapting the learning rate, Adam combines the advantages of both algorithms for the best performance.

How does Adam work?

The basic idea of Adam is to estimate the first and second moments of the gradients, which are the mean and the variance, respectively. The first moment is also called the momentum, and the second moment is also called the uncentered variance. Adam uses these moments to update the parameters in a way that balances the step size, as done by RMSProp, and the direction of the gradient, as done by the Nesterov momentum.

The update rule of Adam at a time $t$ is given as follows:

Introduction to Optimization

Vector Calculus

Convex Optimization

Gradient Descent for Non-Convex Optimization

Use Particle Swarm Optimizer to Optimize a Non-convex Function

Constrained Optimization

Miscellaneous Methods

Course Conclusion

Test Your Concepts of Optimization

Training Support Vector Machines (SVMs)

Adaptive Moment Estimation (Adam)

How does Adam work?