BART Model

Learn about the use cases and detailed architecture of the BART model.

We'll cover the following

Bidirectional Auto-Regressive Transformers (BART) is another interesting model introduced by Facebook AI. It is based on the transformer architecture. The BART model is essentially a denoising autoencoder. It is trained by reconstructing corrupted text.

Just like the BERT model, we can use the pre-trained BART model and fine-tune it for several downstream tasks. The BART model is best suited for text generation. It is also used for other tasks, such as language translation and comprehension. The researchers have also shown that the performance of BART is equivalent to that of the RoBERTa model. But how exactly does BART work? What's special about BART? How does it differ from BERT? Let's find out the answers to all these questions in the next section.

Architecture of BART

BART is essentially a transformer model with an encoder and a decoder. We feed corrupted text to the encoder, and the encoder learns the representation of the given text and sends the representation to the decoder. The decoder takes the representation produced by the encoder and reconstructs the original uncorrupted text.

The encoder of the BART model is bidirectional, meaning that it can read a sentence in both directions (left to right and right to left), but the decoder of the BART model is unidirectional, and it reads a sentence only in the left-to-right direction. Thus, in BART, we have a bidirectional encoder (for both directions) and an autoregressive decoder (for a single direction).

The following figure shows the BART model. As we can see, we corrupt the original text (by masking a few tokens) and feed it to the encoder. The encoder learns the representation of the given text and sends the representation to the decoder, which then reconstructs the original uncorrupted text:

Get hands-on with 1400+ tech skills courses.