Tune Learning Rate and Batch Size
Learn what happens when we tweak the learning rate and batch size while training a neural network.
We'll cover the following...
Tune the learning rate
We’ll use our old hyperparameter called lr
. This hyperparameter has been with us since almost the beginning of this course. Chances are, we already tuned it, maybe by trying a few random values. It’s time to be more precise about lr
tuning.
To understand the trade-off of different learning rates, let’s go back to the basics and visualize gradient descent. The following diagrams show a few steps of GD along a one-dimensional loss curve, with three different values of lr
. The red cross marks the starting point, and the green cross marks the minimum:
Let’s remember what lr
does. The bigger it is, the larger each step of GD is. The first diagram uses a small lr
, so the algorithm takes tiny steps towards the minimum. The second example uses a larger lr
, which results in bolder steps and a faster descent.
However, we cannot just set a very large lr
and blaze towards the minimum at ...
Create a free account to view this lesson.
By signing up, you agree to Educative's Terms of Service and Privacy Policy