Putting All the Decoder Components Together
Let's place all the decoder components together for the transformer.
We'll cover the following...
The following figure shows the stack of two decoders; only decoder 1 is expanded to reduce the clutter:
Press + to interact
How the decoder works
From the preceding figure, we can understand the following:
We convert the input to the decoder into an embedding matrix and then add the position encoding to it and feed it as input to the bottom-most decoder (decoder 1).
The decoder takes the input and sends it to the masked multi-head attention layer, which returns the attention matrix,
...