Evaluation

Learn how to evaluate a pre-trained model stored in a checkpoint.

A. Training vs. evaluation

To measure how well our model has been trained, we evaluate it on datasets other than the training set. The datasets used for evaluation are known as the validation and test sets. Note that we don’t shuffle or repeat the evaluation datasets, since those are techniques used specifically to improve training.

The validation set is used to evaluate a model in between training runs. We use the validation set to tweak certain hyperparameters for a model (such as learning rate or batch size) in order to make sure training continues smoothly. We also use the validation set to detect model overfitting, so we can stop training if overfitting is detected.

Example of overfitting a model on the training set. Overfitting is a common occurrence when training a complex model for too long.
Example of overfitting a model on the training set. Overfitting is a common occurrence when training a complex model for too long.

Overfitting occurs when we train a model (usually a relatively complex model) for too long on ...