...

/

Tune Learning Rate and Batch Size

Tune Learning Rate and Batch Size

Learn what happens when we tweak the learning rate and batch size while training a neural network.

Tune the learning rate

We’ll use our old hyperparameter called lr. This hyperparameter has been with us since almost the beginning of this course. Chances are, we already tuned it, maybe by trying a few random values. It’s time to be more precise about lr tuning.

To understand the trade-off of different learning rates, let’s go back to the basics and visualize gradient descent. The following diagrams show a few steps of GD along a one-dimensional loss curve, with three different values of lr. The red cross marks the starting point, and the green cross marks the minimum:

Let’s remember what lr does. The bigger it is, the larger each step of GD is. The first diagram uses a small lr, so the algorithm takes tiny steps towards the minimum. The second example uses a larger lr, which results in bolder steps and a faster descent.

However, we cannot just set a very large lr and blaze towards the minimum at ...

Create a free account to view this lesson.

By signing up, you agree to Educative's Terms of Service and Privacy Policy