Learning Rate

Learn about how the choice of learning rate affects the model.

Introduction to the learning rate

The learning rate is the most important hyper-parameter. There is a huge amount of material on how to choose a learning rate, how to modify the learning rate during the training, and how the wrong learning rate can completely ruin the model training.

Maybe you might have seen this famous graph (from Stanford’s CS231n class) that shows how a learning rate that is too big or too small affects the loss during training. This is pretty much general knowledge, but it needs to be thoroughly explained and visually demonstrated to be truly understood. So, let us start!

To start it off, I will tell you a little story (trying to build an analogy here; please bear with me).

Imagine you are coming back from hiking in the mountains, and you want to get back home as quickly as possible. At some point in your path, you can either choose to go ahead or to make a right turn.

The path ahead is almost flat, while the path to your right is somewhat steep. The steepness is the gradient. If you take a single step one way or another, it will lead to different outcomes (you will descend more if you take one step to the right instead of going ahead).

But, you know that the path to your right is getting you home faster, so you do not just take one step, but multiple steps in that direction. The steeper the path, the more steps you take! You just cannot resist the urge to take that many steps; your behavior seems to be completely determined by the landscape.

But, you still have one choice. You can adjust the size of your step. You can choose to take ...

Access this course and 1400+ top-rated courses and projects.