Oversampling with Text Augmentation

Learn how to improve the diversity of the training data by creating artificial matches using text augmentation.

We discuss the following two common issues with training data in this lesson:

  • Usually, our training datasets contain many examples of no-matches and only a few matches. In machine learning jargon, this is a severe class imbalance between the majority (no-matches) and minority classes (matches).

  • The few examples from the minority class do not cover all class-invariant transformations well, which we have seen in similar tasks (prior knowledge). Our model will not generalize well to unseen examples.

Let’s see how text augmentation can help reveal such problems using the following dataset of restaurant records:

Get hands-on with 1400+ tech skills courses.