Encoder-Decoder Attention

Uncover encoder-decoder attention and autoregressive decoding in transformers for neural machine translation, emphasizing self-attention's pivotal role.

So far, we've discussed the self-attention mechanism. To fully understand the solution presented in the Attention is All You NeedVaswani, Ashish, et al. "Attention Is All You Need." In Advances in Neural Information Processing Systems, 30, 2017. paper for neural machine translation (NMT), we need to explore how the decoder is constructed, just as we've seen how the encoder is built. This leads us to the encoder-decoder attention operation.

Understanding decoder components

Similar to the encoder, the new layer incorporates self-attention and encoder-decoder attention components. This layer combines the encoder layer outputs with the decoder's current output.

Masked decoder self-attention

The encoder-decoder attention layer functions akin to multiheaded self-attention. However, it generates its queries matrix from the layer directly below it while utilizing the keys and values matrix from the encoder stack's output.

Get hands-on with 1400+ tech skills courses.