Configuration
Configure a dataset with shuffling, repetitions, and batch size.
We'll cover the following...
Chapter Goals:
- Learn how to configure a dataset
- Implement a function that shuffles, repeats, and batches an input dataset
A. Shuffling
When using a dataset to train a machine learning model, there are certain things we need to do to properly configure the dataset. When we first create a dataset from NumPy arrays or files, the observations may be ordered in a particular way. For example, many data files will sort the data observations by some particular feature, like a person’s name or year.
While systematic ordering of data files makes it easier for humans to look over the data, it actually hinders the training of a machine learning model. The model will learn to make predictions based on the ordering of the observations rather than the observations themselves, which is not what we want our model to ...
Access this course and 1400+ top-rated courses and projects.