...

/

Is Attention All We Need?

Is Attention All We Need?

Discover what we can do with attention and try it yourself in this lesson.

In 2017, Google took the concept of attention mechanisms to a whole new level. They moved away from traditional approaches, such as relying on recurrent connections, and instead embraced pure attention mechanisms.

The evolution of attention mechanisms

The sequence-to-sequence models we were familiar with used RNNs for encoding and decoding. As previously mentioned, these models faced fundamental issues. First, the hidden state of the final encoder lacked sufficient information. Moreover, they were slow in processing, as they depended on sequentially processing the input sequence to reach the final state before generating the first output token.

The challenge of sequential processing

Let's look at the encoder-decoder architecture to better understand the challenge of sequential processing.

Press + to interact
Encoder-decoder architecture
Encoder-decoder architecture

For instance, all preceding tokens had to be processed sequentially to produce a single output token. This approach didn't fully leverage parallel processing capabilities, such as GPUs. In contrast, ...

Access this course and 1400+ top-rated courses and projects.