Search⌘ K

Minibatch Gradient Descent

Explore how minibatch gradient descent improves optimization of non-convex problems by balancing update stability and speed. Understand the technique's use in handling large datasets, reducing variance compared to stochastic gradient descent, and practical implementation with Python libraries for machine learning tasks.

Stochastic gradient descent (SGD)

Recall that to compute the gradient θJ(θ)\nabla_\theta J(\theta) of an objective J(θ)J(\theta), we need to aggregate the gradients θL(fθ(xi),yi)\nabla_\theta \mathcal{L}(f_\theta(x_i), y_i) ...