Fundamentals of Machine Learning for Software Engineers/

...

Tune Learning Rate and Batch Size

Learn what happens when we tweak the learning rate and batch size while training a neural network.

We'll cover the following...

Tune the learning rate
Tuning the batch size

Tune the learning rate

We’ll use our old hyperparameter called lr. This hyperparameter has been with us since almost the beginning of this course. Chances are, we already tuned it, maybe by trying a few random values. It’s time to be more precise about lr tuning.

To understand the trade-off of different learning rates, let’s go back to the basics and visualize gradient descent. The following diagrams show a few steps of GD along a one-dimensional loss curve, with three different values of lr. The red cross marks the starting point, and the green cross marks the minimum:

Let’s remember what lr does. The bigger it is, the larger each step of GD is. The first diagram uses a small lr, so the algorithm takes tiny steps towards the minimum. The second example uses a larger lr, which results in bolder steps and a faster descent.

However, we cannot just set a very large lr and blaze towards the minimum at ludicrous speed, as the third diagram proves. In this case, lr is so large that each step of gradient descent lands farther away from the goal than it started. Not only does this training process fail to find the minimum, but it ...

How Machine Learning Works

Our First Learning Program

Walking the Gradient

Hyperspace

A Discern Machine

Get Real

The Final Challenge

The Perceptron

Designing the Network

Building the Network

Training the Network

How Classifiers Work

Batchin’ Up

The Zen of Testing

Let’s Do Development

A Deeper Kind of Network

Diabetes Prediction Using Keras

Defeating Overfitting

Taming Deep Networks

Beyond Vanilla Networks

Into the Deep

Recognize Handwritten Digits Using a Deep Neural Network

Machine Learning Fundamentals

Tune Learning Rate and Batch Size

Tune the learning rate