Self-Attention Mechanism

Learn about the self-attention mechanism and how to compute the embedding of the words.

We'll cover the following...

Self-attention mechanism
- Representation of the words
- Computing the representation of the words
What is embedding?
- Dimensions of the embedding matrix

Computing the representation of the words

For instance, while computing the representation of the word 'it', our model relates the word 'it' to all the words in the sentence to understand more about the word 'it'.

As shown in the following figure, in order to compute the representation of the word 'it', our model relates the word 'it' to all the words in the sentence. By relating the word 'it' to all the words in the sentence, our model can understand that the word 'it' is related to the word 'dog' and not 'food'. As we can observe, the line connecting the word 'it' to 'dog' is thicker compared to the other lines, which indicates that the word 'it' is related to the word 'dog' and not 'food' in the given sentence:

Press + to interact

Before We Start

Starting Off with BERT

A Primer on Transformers

Understanding the BERT Model

Getting Hands-On with BERT

Exploring BERT Variants

Different BERT Variants

BERT Variants—Based on Knowledge Distillation

Applications of BERT

Exploring BERTSUM for Text Summarization

Semantic Search with Transformers

Applying BERT to Other Languages

Exploring Sentence and Domain-Specific BERT

Working with VideoBERT, BART, and More

Conclusion

Similarity Detection in English Language Using RoBERTa

Self-Attention Mechanism

Self-attention mechanism

Representation of the words

Computing the representation of the words

What is embedding?