Generator and Discriminator of the ELECTRA Model

Learn the working of the generator and discriminator model in detail and find out why we should prioritize using ELECTRA over BERT.

The generator model

First, let's have a look at the generator. The generator performs the MLM task. We randomly mask a few tokens with a 15% mask rate and train the generator to predict the masked token. Let's represent our input tokens as X=[x1,x2,...,xn]X = [x_1, x_2, ..., x_n]. We randomly mask some tokens and feed them as input to the generator, which returns the representation of each of the tokens. Let hG(X)=[h1,h2,...,hn]h_G (X) = [h_1, h_2, ..., h_n]denote the representation of each token returned by the generator. That is, h1h_1 denotes the representation of the first token x1x_1, h2h_2 denotes the representation of the second token x2x_2, and so on.

Get hands-on with 1400+ tech skills courses.