Stochastic Gradient Descent

Learn about SGD-based optimizers in JAX and Flax.

We'll cover the following

SGD implements stochastic gradient descent with support for momentum and Nesterov accelerationNesterov acceleration refers to a method of accelerating the convergence of iterative optimization algorithms commonly used in machine learning.. Momentum makes obtaining optimal model weights faster by accelerating gradient descent in a certain direction.

Get hands-on with 1400+ tech skills courses.