Terminology of Transformer Models

Learn about the different terms used to describe various transformer models.

The past decades have produced Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and more types of Artificial Neural Networks (ANNs). They all have a certain amount of vocabulary in common.

Transformer models introduced some new words and used existing words slightly differently. This lesson briefly describes transformer models to clarify the usage of deep learning vocabulary when applied to transformers.

Motivation

The motivation of transformer model architecture relies upon an industrial approach to deep learning. The geometric nature of transformers boosts parallel processing. In addition, the architecture of transformers perfectly fits hardware optimization requirements. Google, for example, took advantage of the stack structure of transformers to design domain-specific optimized hardware that requires less floating-number precision.

Designing transformers models implies taking hardware into account. Therefore, the architecture of a transformer combines software and hardware optimization from the start.

This lesson defines some of the new usages of neural network language.

Stack

A stack contains identically sized layers that differ from classical deep learning models, as shown in the figure below. A stack runs from bottom to top. A stack can be an encoder or a decoder.

Get hands-on with 1200+ tech skills courses.