Minibatch Gradient Descent

Learn how to use minibatch gradient descent to solve the intractable gradient descent optimization.

Stochastic gradient descent (SGD)

Recall that to compute the gradient θJ(θ)\nabla_\theta J(\theta) of an objective J(θ)J(\theta), we need to aggregate the gradients θL(fθ(xi),yi)\nabla_\theta \mathcal{L}(f_\theta(x_i), y_i) ...