Search⌘ K

Generator and Discriminator of the ELECTRA Model

Understand how the ELECTRA model uses two components: the generator predicts masked tokens by learning from a small percentage of masked input, while the discriminator classifies whether tokens are original or replaced. This lesson reveals how ELECTRA improves training efficiency compared to traditional BERT by using all tokens for learning, enhancing token classification accuracy in NLP applications.

The generator model

First, let's have a look at the generator. The generator performs the MLM task. We randomly mask a few tokens with a 15% mask rate and train the generator to predict the masked token. Let's represent our input tokens as X=[x1,x2,...,xn]X = [x_1, x_2, ..., x_n]. We randomly mask some tokens and feed them as input to the generator, which returns the representation of each of the tokens. Let hG(X)=[h1,h2,...,hn]h_G (X) = [h_1, h_2, ..., h_n] ...