Data Augmentation

Learn how to increase variety in data by applying data augmentation techniques.

Augmentation

Modifying word vectors to enhance fairness is a good strategy. However, its application is restricted to static vectors. What if we leverage contextual embeddings like BERT? Or refrain from using vectors altogether? We dig into a method that is universally applicable.

Data augmentation involves generating additional training examples. These examples aim to diversify the training data, assisting the model in discerning more robust relationships. We might wonder: by deriving augmented data from the training set, do we genuinely enrich the information available to the model? The answer is yes. The perspective here is not to create new information out of nowhere but to present the existing data in a diverse manner.

This technique has gained traction in computer vision. It’s impractical to capture images of an object in every conceivable orientation, scale, and lighting condition. Instead of photographing the same object under varying scenarios, we can simulate conditions through automated resizing, cropping, rotating, and altering color balances. Thus, even if the original dataset lacks rotated objects, the augmented samples enable the model to identify them. The below example augmentations in the image domain. The original image (top left) can be transformed to create multiple additional samples like rotation, cropping, color balance, or hiding a part of the image. In natural language processing, we will rephrase sentences to add variety.

Get hands-on with 1400+ tech skills courses.