Noising Techniques
Learn about different noising techniques for text corruption and their comparison to find the best one.
We've learned that we corrupt the text and feed it to the encoder of BART. But how exactly do we corrupt the text? Does corrupting only include masking few tokens? Not necessarily.
The researchers have proposed several interesting noising techniques for corrupting the text:
Token masking
Token deletion
Token infilling
Sentence shuffling
Document rotation
Let's take a closer look at each of these methods.
Token masking
In token masking, as the name suggests, we randomly mask a few tokens. That is, we randomly replace a few tokens with [MASK], just as we did in the BERT model. A simple example is shown in the following table:
Get hands-on with 1400+ tech skills courses.