Gradient Descent
This lesson will focus on the intuition behind the gradient descent algorithm.
We'll cover the following...
In the last lesson, we minimized a loss function to find the best model to predict the tip paid by customers. But there was a drawback with the approach. We manually entered the values of the model parameter and compared the losses. But this approach of manually choosing the values of the model parameters is not scalable because:
-
It works only on predetermined values of
-
Most models have many model parameters and complex structures of the prediction function, for which it will require a lot of time to choose parameters manually.
-
We may not choose the best set of model parameters, and then we will not get to the best model.
We need some approach that chooses the model parameters automatically and then arrives at the best model.
Intuition
Since we need a method in which we do not use predetermined values of , let’s start by picking a random value of and see what our loss is. After this, we will decide whether to increase the current value of or decrease it and the amount to increase or decrease. Let’s look at the direction and amount separately.
Direction of change in
Look at the error surface of the example in the previous lesson below.
We have highlighted two points in this curve. The red highlighted point A is the value of the loss at . If we choose this as our starting point for , then at this point, we need to choose a new value for that is closer to the minimum of this curve. Let’s look at the slope of the line at this point.
The slope at point A is negative, which means that if we increase from this point, the loss will decrease. Therefore, we need to increase the value of from here to reach the minimum of the curve.
Now let’s consider another situation where we start at ...