Putting All the Encoder Components Together
Let's put all the encoder components together.
We'll cover the following...
The following figure shows the stack of two encoders; only encoder 1 is expanded to reduce the clutter:
Press + to interact
Working of the encoder
From the preceding figure, we can understand the following:
First, we convert our input to an input embedding (embedding matrix), and then add the position encoding to it and feed it as input to the bottom-most encoder (encoder 1).
Encoder 1 takes the input and sends it to the multi-head attention sublayer, which returns the attention matrix as output.
We take the attention matrix and feed it as input to the next ...