ELECTRA
Learn about the replaced token detection task and how ELECTRA is pre-trained using it.
Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA) is another interesting variant of BERT. We pre-train BERT using the MLM and NSP tasks. We know that in the MLM task, we randomly mask 15% of the tokens and train BERT to predict the masked token. Instead of using the MLM task as a pre-training objective, ELECTRA is pre-trained using a task called replaced token detection.
The replaced token detection task
The replaced token detection task is very similar to MLM, but instead of masking a token with the [MASK] token, we replace a token with a different token and train the model to classify whether the given tokens are actual or replaced tokens.
Why ELECTRA does not use the MLM task
Get hands-on with 1400+ tech skills courses.