Step 5 - Rinse and Repeat!

Learn about what epochs are and how the path of a gradient descent can change depending on the type of gradient descent being used.

Introduction to epoch

Before we continue our process, let us explore what exactly is an epoch and when it gets completed since we will be using this later on.

Definition

The number of epochs is a hyper-parameter that refers to the number of complete iterations of the algorithm being used through the training set.

An epoch is complete whenever every point in the training set (N) has already been used in all steps: forward pass, computing loss, computing gradients, and updating parameters.

Updates and gradient descent

During one epoch, we perform at least one update, but no more than N updates. The number of updates (N/n) will depend on the type of gradient descent being used:

  • For batch (n = N) gradient descent, this is trivial, as it uses all points for computing the loss; one epoch is the same as one update.

  • For stochastic (n = 1) gradient descent, one epoch means N updates since every individual data point is used to perform an update.

  • For mini-batch of size n, one epoch has N/n updates since a mini-batch of n data points is used to perform an update.

Restarting the process

Moving back to what we were doing before, now we use the updated parameters to go back to Step 1, and restart the process.

Repeating this process over and over for many epochs is training a model in a nutshell.

What happens if we run it over 1,000 epochs? We can check the results out by looking at the figure given below:

Get hands-on with 1300+ tech skills courses.