Step 3 - Compute the Gradients

Learn about gradients and how they can be computed and visualized.

Introduction to gradients

A gradient is a partial derivative; why partial? Because one computes it with respect to (w.r.t.) a single parameter. Since we have two parameters, b and w, we must compute two partial derivatives.

A derivative tells you how much a given quantity changes when you slightly vary some other quantity. In our case, how much does our MSE loss change when we vary each one of our two parameters separately?

Gradient = how much the loss changes if ONE parameter changes a little bit!

The right-most part of the equations below is what you usually see in implementations of gradient descent for simple linear regression. In the intermediate step, you will be shown all elements that pop-up from the application of the chain rule, so you know how the final expression came to be. This can be seen below:

MSEb=MSEyi^yi^b=1ni=1n2(b+wxiyi)\dfrac{\partial MSE}{\partial b} = ...