Introduction to Batch

Learn what a batch is and how we can implement it.

Mini-batch gradient descent

The style of gradient descent that we have used so far is also called batch gradient descent because it clusters all the training examples into one big batch, and calculates the gradient of the loss over the entire batch. A common alternative is called mini-batch gradient descent. It segments the training set into smaller batches, and then takes a step of gradient descent for each batch.

We might wonder how small batches help speed up training. Let’s implement a mini-batch GD and give it a test drive.

Implementing batches

In most cases, we should shuffle a dataset before splitting it into batches. That way, we’re sure each batch contains a nice mix of examples instead of having all the examples of a certain type clustered in the same batch. However, MNIST already comes pre-shuffled, so we can just take the training set and split it into batches straight away. This function does the job:

Press + to interact
def prepare_batches(X_train, Y_train, batch_size):
x_batches = []
y_batches = []
n_examples = X_train.shape[0]
for batch in range(0, n_examples, batch_size):
batch_end = batch + batch_size
x_batches.append(X_train[batch:batch_end])
y_batches.append(Y_train[batch:batch_end])
return x_batches, y_batches

This code loops from 0 to the number of training examples, with a step equal to the batch size. Then it slices X_train to get a batch of examples, and appends the batch to a list. It does the same with Y_train. The results are two lists that contain the training set ...