...

Introduction to the Transformer

Get introduced to one of the most popular and state-of-the-art deep neural architecture---the Transformer.

We'll cover the following...

Why transformer?
How does the transformer work

To overcome this limitation of RNNs, a new architecture called Transformer was introduced in the paper “Attention Is All You NeedVaswani, Ashish, Google Brain, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Łukasz Kaiser, and Illia Polosukhin. n.d. “Attention Is All You Need.” https://papers.nips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf ‌.” The transformer is currently the state-of-the-art model for several NLP tasks. The advent of the transformer created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT, GPT-3, T5, and more.

The transformer model is based entirely on the attention mechanism and completely gets rid of recurrence. The transformer uses a special type of attention mechanism called self-attention.

How does the transformer work

Let's understand how the transformer works with a language translation task. The transformer consists of an encoder-decoder architecture. We feed the input sentence (source sentence) to the encoder. The encoder learns the representation of the input sentence and sends the representation to the decoder. The decoder receives the representation learned by the encoder as an input and generates the output sentence (target sentence).

Suppose we need to convert a sentence from English to French. We feed the English sentence as input to the encoder. The encoder learns the representation of the given English sentence and feeds the representation to the decoder. The decoder takes the encoder's representation as input and generates the French sentence as output:

Before We Start

Starting Off with BERT

A Primer on Transformers

Semantic Search with Transformers

Understanding the BERT Model

Getting Hands-On with BERT

Exploring BERT Variants

Different BERT Variants

BERT Variants—Based on Knowledge Distillation

Applications of BERT

Exploring BERTSUM for Text Summarization

Applying BERT to Other Languages

Exploring Sentence and Domain-Specific BERT

Working with VideoBERT, BART, and More

Conclusion

Similarity Detection in English Language Using RoBERTa

Introduction to the Transformer

Why transformer?

How does the transformer work