Introduction to the learning rate

The learning rate is the most important hyper-parameter. There is a huge amount of material on how to choose a learning rate, how to modify the learning rate during the training, and how the wrong learning rate can completely ruin the model training.

Maybe you might have seen this famous graph (from Stanford’s CS231n class) that shows how a learning rate that is too big or too small affects the loss during training. This is pretty much general knowledge, but it needs to be thoroughly explained and visually demonstrated to be truly understood. So, let us start!

To start it off, I will tell you a little story (trying to build an analogy here; please bear with me).

The path ahead is almost flat, while the path to your right is somewhat steep. The steepness is the gradient. If you take a single step one way or another, it will lead to different outcomes (you will descend more if you take one step to the right instead of going ahead).

But, you know that the path to your right is getting you home faster, so you do not just take one step, but multiple steps in that direction. The steeper the path, the more steps you take! You just cannot resist the urge to take that many steps; your behavior seems to be completely determined by the landscape.

But, you still have one choice. You can adjust the size of your step. You can choose to take steps of ...

Introduction

Visualizing Gradient Descent

A Simple Regression Problem

Rethinking the Training Loop

Going Classy

A Simple Classification Problem

Conclusion

Appendix

Learning Rate

Introduction to the learning rate