...

ELECTRA

Learn about the replaced token detection task and how ELECTRA is pre-trained using it.

We'll cover the following...

The replaced token detection task
- Why ELECTRA does not use the MLM task
Understanding the replaced token detection task
Workflow of the replaced token detection task

Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA) is another interesting variant of BERT. We pre-train BERT using the MLM and NSP tasks. We know that in the MLM task, we randomly mask 15% of the tokens and train BERT to predict the masked token. Instead of using the MLM task as a pre-training objective, ELECTRA is pre-trained using a task called replaced token detection.

The replaced token detection task

The replaced token detection task is very similar to MLM, but instead of masking a token with the [MASK] token, we replace a token with a different token and train the model to classify whether the given tokens are actual or replaced tokens.

Before We Start

Starting Off with BERT

A Primer on Transformers

Understanding the BERT Model

Getting Hands-On with BERT

Exploring BERT Variants

Different BERT Variants

BERT Variants—Based on Knowledge Distillation

Applications of BERT

Exploring BERTSUM for Text Summarization

Semantic Search with Transformers

Applying BERT to Other Languages

Exploring Sentence and Domain-Specific BERT

Working with VideoBERT, BART, and More

Conclusion

Similarity Detection in English Language Using RoBERTa

ELECTRA

The replaced token detection task

Why ELECTRA does not use the MLM task