Configuration

Configure a dataset with shuffling, repetitions, and batch size.

Chapter Goals:

  • Learn how to configure a dataset
  • Implement a function that shuffles, repeats, and batches an input dataset

A. Shuffling

When using a dataset to train a machine learning model, there are certain things we need to do to properly configure the dataset. When we first create a dataset from NumPy arrays or files, the observations may be ordered in a particular way. For example, many data files will sort the data observations by some particular feature, like a person’s name or year.

While systematic ordering of data files makes it easier for humans to look over the data, it actually hinders the training of a machine learning model. The model will learn to make predictions based on the ordering of the observations rather than the observations themselves, which is not what we want our model to ...

Access this course and 1400+ top-rated courses and projects.