Understanding BERT
Learn about BERT and its input processing.
Bidirectional Encoder Representation from Transformers (BERT) is a transformer model among a plethora of transformer models that have come to light over the past few years.
BERT was introduced in the paper
Encoder-based models
Decoder-based (autoregressive) models
In other words, either the encoder or the decoder part of the transformer provides the foundation for these models, compared to using both the encoder and the decoder. The main difference between the two is how attention is used. Encoder-based models use bidirectional attention, whereas decoder-based models use autoregressive (that is, left to right) attention.
BERT is an encoder-based transformer model. It takes an input sequence (a collection of tokens) and produces an encoded output sequence. The figure below depicts the high-level architecture of BERT :
Get hands-on with 1400+ tech skills courses.