A significant challenge when training a machine learning model is deciding how many epochs to run. Too few epochs might not lead to model convergence, while too many epochs could lead to overfitting.
Early stopping is an optimization technique used to reduce overfitting without compromising on model accuracy. The main idea behind early stopping is to stop training before a model starts to overfit.
There are three main ways early stopping can be achieved. Let’s look at each of them:
1. Training model on a preset number of epochs
This method is a simple, but naive way to early stop. By running a set number of epochs, we run the risk of not reaching a satisfactory training point. With a higher learning rate, the model might possibly converge with fewer epochs, but this method requires a lot of trial and error. Due to the advancements in machine learning, this method is pretty obsolete.
2. Stop when the loss function update becomes small
This approach is more sophisticated than the first as it is built on the fact that the weight updates in gradient descent become significantly smaller as the model approaches minima. Usually, the training is stopped when the update becomes as small as 0.001, as stopping at this point minimizes loss and saves computing power by preventing any unnecessary epochs. However, overfitting might still occur.
3. Validation set strategy
This clever technique is the most popular early stopping approach. To understand how it works, it’s important to look at how training and validation errors change with the number of epochs (as in the figure above). The training error decreases exponentially until increasing epochs no longer have such a large effect on the error. The validation error, however, initially decreases with increasing epochs, but after a certain point, it starts increasing. This is the point where a model should be early stopped as beyond this the model will start to overfit.
Although the validation set strategy is the best in terms of preventing overfitting, it usually takes a large number of epochs before a model begins to overfit, which could cost a lot of computing power. A smart way to get the best of both worlds is to devise a hybrid approach between the validation set strategy and then stop when the loss function update becomes small. For example, the training could stop when either of them is achieved.
Free Resources