Introduction: Architecture of the Transformer Model

Get to know the concepts related to the transformer model as described by Vaswani et al.

We'll cover the following

Language is the essence of human communication. Civilizations would never have been born without the word sequences that form language. We now mostly live in a world of digital representations of language. Our daily lives rely on NLP digitalized language functions: web search engines, emails, social networks, posts, tweets, smartphone texting, translations, web pages, speech-to-text on streaming sites for transcripts, text-to-speech on hotline services, and many more everyday functions.

Previously, we learned about the limits of RNNs and the birth of cloud AI transformers taking over a fair share of design and development. The role of the Industry 4.0 developer is to understand the architecture of the original transformer and the multiple transformer ecosystems that followed.

In December 2017, Google Brain and Google Research published the seminal Vaswani et al., "Attention is All You Need"The paper can be accessed at: https://arxiv.org/abs/1706.03762 paper. The transformer was born. The transformer outperformed the existing state-of-the-art NLP models. The transformer trained faster than previous architectures and obtained higher evaluation results. As a result, transformers have become a key component of NLP.

Chapter overview

The idea of the attention head of the transformer is to do away with recurrent neural network features. In this chapter, we will open the hood of the transformer model described by Vaswani et al. (2017) and examine the main components of its architecture. We will explore the fascinating world of attention and illustrate the key components of the transformer.

This chapter covers the following topics:

Get hands-on with 1200+ tech skills courses.