...

/

Tune Learning Rate and Batch Size

Tune Learning Rate and Batch Size

Learn what happens when we tweak the learning rate and batch size while training a neural network.

Tune the learning rate

We’ll use our old hyperparameter called lr. This hyperparameter has been with us since almost the beginning of this course. Chances are, we already tuned it, maybe by trying a few random values. It’s time to be more precise about lr tuning.

To understand the trade-off of different learning rates, let’s go back to the basics and visualize gradient descent. The following diagrams show a few steps of GD along a one-dimensional loss curve, with three different values of lr. The red cross marks the starting point, and the green cross marks the minimum:

Let’s remember what lr does. The bigger it is, the larger each step of GD is. The first diagram uses a small lr, so the algorithm takes tiny steps towards the minimum. The second example uses a larger lr, which results in bolder steps and a faster descent.

However, we cannot just set a very large lr and blaze towards the minimum at ludicrous speed, as the third diagram proves. In this case, lr is so large that each step of gradient descent lands farther away from the goal than it started. Not only does this training process fail to find the minimum, but it ...