Sublayer 1: Multi-Head Attention

Learn about the first sublayer known as multi-head attention in this lesson.

The output of positional encoding leads to the multi-head attention sublayer.

The multi-head attention sublayer contains eight heads and is followed by post-layer normalization, which will add residual connections to the output of the sublayer and normalize it:

Get hands-on with 1200+ tech skills courses.