Limitations of Gradient Descent

We have seen how well gradient descent works in the case of convex optimization because of the presence of a single global optimal solution. We will now look at some of the limitations of gradient descent and address them in this chapter.

Intractability

Consider a machine learning problem where we want to minimize the discrepancy between the model prediction fθ(xi)f_\theta(x_i) and the ground-truth label yiy_i, as follows:

Get hands-on with 1400+ tech skills courses.