Step 4 - Update the Parameters

Learn about how you can use the gradients and the learning rate to update the parameters.

We'll cover the following

Updating parameters

In the final step, we use the gradients to update the parameters. Since we are trying to minimize our losses, we reverse the sign of the gradient for the update.

There is still another hyperparameter to consider: the learning rate, denoted by the Greek letter eta (that looks like the letter n). This presents the multiplicative factor that we need to apply to the gradient for the parameter update. Our equation now becomes the following:

b=bηMSEbb = b - \eta{\frac{\partial MSE}{\partial b}}

  w=wηMSEw\space\space w = w - \eta{\frac{\partial MSE}{\partial w}}

We can also interpret this a bit differently; each parameter is going to have its value updated by a constant value, eta (the learning rate). But this constant is going to be weighted by how much that parameter contributes to minimizing the loss (its gradient).

Honestly, I believe that this way of thinking about the parameter update makes more sense. First, you decide on a learning rate that specifies your step size, while the gradients tell you the relative impact (on the loss) of taking a step for each parameter. Then you take a given number of steps that is proportional to that relative impact; more impact, more steps.

Get hands-on with 1300+ tech skills courses.