Encoder-Decoder Attention
Uncover encoder-decoder attention and autoregressive decoding in transformers for neural machine translation, emphasizing self-attention's pivotal role.
So far, we've discussed the self-attention mechanism. To fully understand the solution presented in the
Understanding decoder components
Similar to the encoder, the new layer incorporates self-attention and encoder-decoder attention components. This layer combines the encoder layer outputs with the decoder's current output.
Masked decoder self-attention
The encoder-decoder attention layer functions akin to multiheaded self-attention. However, it generates its queries matrix from the layer directly below it while utilizing the keys and values matrix from the encoder stack's output.