Understanding the masking method

In the masking method, with probability $p_{\text{mask}}$ , we randomly mask a word in the sentence with the [MASK] token and create a new sentence with the masked token. For instance, suppose we are performing a sentiment analysis task and, say in our dataset, we have the sentence 'I was listening to music'. Now, with probability $p_{\text{mask}}$ , we randomly mask a word. Say we have masked the word 'music', then we have a new sentence: 'I was listening to [MASK]'.

But how is this useful? With the [MASK] token in the sentence, our model will not be able to produce the confidence logits since [MASK] is an unknown token. Our model produces less confident logits for the sentence 'I was listening to [MASK]' with a [MASK] token than for the sentence 'I was listening to music' with the unmasked token. This helps our model understand the contribution of each word to the label.

Understanding the POS-guided word replacement method

In the POS-guided (parts of speech guided) word replacement method, with probability $p_{\text{pos}}$ , we replace a word in a sentence with another word but with the same parts of speech.

For example, consider the sentence 'Where did you go?' We know that in this sentence, the word 'did' is a verb. Now we can replace the word 'did' with another verb. So now our sentence becomes 'where do you go?' As you can see, we replaced the word 'did' with 'do' and obtained a new sentence. ...

Before We Start

Starting Off with BERT

A Primer on Transformers

Understanding the BERT Model

Getting Hands-On with BERT

Exploring BERT Variants

Different BERT Variants

BERT Variants—Based on Knowledge Distillation

Applications of BERT

Exploring BERTSUM for Text Summarization

Semantic Search with Transformers

Applying BERT to Other Languages

Exploring Sentence and Domain-Specific BERT

Working with VideoBERT, BART, and More

Conclusion

Similarity Detection in English Language Using RoBERTa

The Data Augmentation Methods

Understanding the masking method

Understanding the POS-guided word replacement method