Introduction to Batch
Learn what a batch is and how we can implement it.
We'll cover the following
Mini-batch gradient descent
The style of gradient descent that we have used so far is also called batch gradient descent because it clusters all the training examples into one big batch, and calculates the gradient of the loss over the entire batch. A common alternative is called mini-batch gradient descent. It segments the training set into smaller batches, and then takes a step of gradient descent for each batch.
We might wonder how small batches help speed up training. Let’s implement a mini-batch GD and give it a test drive.
Implementing batches
In most cases, we should shuffle a dataset before splitting it into batches. That way, we’re sure each batch contains a nice mix of examples instead of having all the examples of a certain type clustered in the same batch. However, MNIST already comes pre-shuffled, so we can just take the training set and split it into batches straight away. This function does the job:
Get hands-on with 1200+ tech skills courses.