...

/

Limitations of Gradient Descent

Limitations of Gradient Descent

Learn about the limitations of the gradient descent algorithms in non-convex optimization.

We have seen how well gradient descent works in the case of convex optimization because of the presence of a single global optimal solution. We will now look at some of the limitations of gradient descent and address them in this chapter.

Intractability

Consider a machine learning problem where we want to minimize the discrepancy between the model prediction fθ(xi)f_\theta(x_i) and the ground-truth label yiy_i, as follows:

Here, L\mathcal{L} is an arbitrary loss function, such as cross-entropy, to measure the discrepancy between the predicted and the ground-truth value. The gradient descent update for the objective above at any time tt can be written as follows:

To compute the gradient θJ(θ)\nabla_\theta J(\theta) ...