Summary: Transformers
Review what we've learned in this chapter.
We'll cover the following
How transformers models work
In this chapter, we talked about transformer models. First, we looked at the transformer at a microscopic level to understand the inner workings of the model. We saw that transformers use self-attention, a powerful technique to attend to other inputs in the text sequences while processing one input. We also saw that transformers use positional embeddings to inform the model about the relative position of tokens in addition to token embeddings. We also discussed that transformers leverage residual connections (that is, shortcut connections) and layer normalization in order to improve model training.
Get hands-on with 1400+ tech skills courses.