Mini-batch gradient descent

The style of gradient descent that we have used so far is also called batch gradient descent because it clusters all the training examples into one big batch, and calculates the gradient of the loss over the entire batch. A common alternative is called mini-batch gradient descent. It segments the training set into smaller batches, and then takes a step of gradient descent for each batch.

We might wonder how small batches help speed up training. Let’s implement a mini-batch GD and give it a test drive.

Implementing batches

In most cases, we should shuffle a dataset before splitting it into batches. That way, we’re sure each batch contains a nice mix of examples instead of having all the examples of a certain type clustered in the same batch. However, MNIST already comes pre-shuffled, so we can just take the training set and split it into batches straight away. This function does the job:

Press + to interact

How Machine Learning Works

Our First Learning Program

Walking the Gradient

Hyperspace

A Discern Machine

Get Real

The Final Challenge

The Perceptron

Designing the Network

Building the Network

Training the Network

How Classifiers Work

Batchin’ Up

The Zen of Testing

Let’s Do Development

A Deeper Kind of Network

Diabetes Prediction Using Keras

Defeating Overfitting

Taming Deep Networks

Beyond Vanilla Networks

Into the Deep

Recognize Handwritten Digits Using a Deep Neural Network

Machine Learning Fundamentals

Introduction to Batch

Mini-batch gradient descent

Implementing batches