Introduction to gradients

A gradient is a partial derivative; why partial? Because one computes it with respect to (w.r.t.) a single parameter. Since we have two parameters, b and w, we must compute two partial derivatives.

A derivative tells you how much a given quantity changes when you slightly vary some other quantity. In our case, how much does our MSE loss change when we vary each one of our two parameters separately?

The right-most part of the equations below is what you usually see in implementations of gradient descent for simple linear regression. In the intermediate step, you will be shown all elements that pop-up from the application of the chain rule, so you know how the final expression came to be. This can be seen below:

$\dfrac{\partial MSE}{\partial b} = \dfrac{\partial MSE}{\partial \hat{y_i}} \cdot{\dfrac{\partial \hat{y_i}}{\partial b}} = \dfrac{1}{n} \sum_{i=1}^n{2(b + w x_i - y_i)}$ ...

Introduction

Visualizing Gradient Descent

A Simple Regression Problem

Rethinking the Training Loop

Going Classy

A Simple Classification Problem

Conclusion

Appendix

Step 3 - Compute the Gradients

Introduction to gradients