Adaptive Moment Estimation (Adam)

Learn how the Adaptive Moment Estimation (Adam) algorithm dynamically adapts the learning rate as well as escapes the local optimums at the same time.

Adaptive Moment Estimation (Adam) is an optimization algorithm designed to address some of the shortcomings of gradient descent in training deep neural networks. The algorithm dynamically adapts the learning rate as well as escapes the local optimums at the same time.

While the Nesterov momentum only helps in escaping the local optimums and RMSProp only helps in dynamically adapting the learning rate, Adam combines the advantages of both algorithms for the best performance.

How does Adam work?

The basic idea of Adam is to estimate the first and second moments of the gradients, which are the mean and the variance, respectively. The first moment is also called the momentum, and the second moment is also called the uncentered variance. Adam uses these moments to update the parameters in a way that balances the step size, as done by RMSProp, and the direction of the gradient, as done by the Nesterov momentum.

The update rule of Adam at a time tt is given as follows:

Get hands-on with 1200+ tech skills courses.