The Data Augmentation Methods
Learn different methods to perform task-agnostic data augmentation.
We use the following methods for performing task-agnostic data augmentation:
Masking
POS-guided word replacement
n-gram sampling
Let's take a look at each one of them.
Understanding the masking method
In the masking method, with probability
But how is this useful? With the [MASK] token in the sentence, our model will not be able to produce the confidence logits since [MASK] is an unknown token. Our model produces less confident logits for the sentence 'I was listening to [MASK]' with a [MASK] token than for the sentence 'I was listening to music' with the unmasked token. This helps our model understand the contribution of each word to the label.
Understanding the POS-guided word replacement method
In the POS-guided (parts of speech guided) word replacement method, with probability
For example, consider the sentence 'Where did you go?' We know that in this sentence, the word 'did' is a verb. Now we can replace the word 'did' with another verb. So now our sentence becomes 'where do you go?' As you can see, we replaced the word 'did' with 'do' and obtained a new sentence.
Understanding the n-gram sampling method
In the n-gram sampling method, with probability
We've learned three different methods for data augmentation. Now let's explore how we exactly apply them.
The data augmentation procedure
Say we have a sentence — 'Paris is a beautiful city'. Let
If
, then we mask the word . If
, then we apply POS-guided word replacement.
Note that masking and POS-guided word replacement are mutually exclusive; if we apply one, then we can't apply the other.
After the preceding step, we will obtain a modified sentence (a synthetic sentence). Now, with probability data_aug
list.
For every sentence, we perform the preceding steps
Data augmentation for sentence pairs
For sentence pairs, we can create synthetic sentence pairs in a number of ways. Some of these are as follows:
We can create a synthetic sentence only from the first sentence and hold the second sentence.
We hold the first sentence and create a synthetic sentence only from the second sentence.
We can create synthetic sentences from both the first and second sentences.
In this way, we can apply the data augmentation method and obtain more data points. Then, we train our student network with augmented data points.
Get hands-on with 1400+ tech skills courses.