In 2017, Google took the concept of attention mechanisms to a whole new level. They moved away from traditional approaches, such as relying on recurrent connections, and instead embraced pure attention mechanisms.

The evolution of attention mechanisms

The sequence-to-sequence models we were familiar with used RNNs for encoding and decoding. As previously mentioned, these models faced fundamental issues. First, the hidden state of the final encoder lacked sufficient information. Moreover, they were slow in processing, as they depended on sequentially processing the input sequence to reach the final state before generating the first output token.

The challenge of sequential processing

Let's look at the encoder-decoder architecture to better understand the challenge of sequential processing.

Get hands-on with 1400+ tech skills courses.