Natural Language Processing with TensorFlow/

...

Transformer Architecture: Residuals and Normalization

Learn about the residuals and normalization in the transformer architecture.

We'll cover the following...

Residual connections
Normalization layer

Another important characteristic of the transformer models is the existence of the residual connections and the normalization layers in between the individual layers of the transformer model.

Residual connections

Residual connections are formed by adding a given layer’s output to the output of one or more layers ahead. This, in turn, forms shortcut connections through the model and provides a stronger gradient flow by reducing the changes of the phenomenon known as vanishing gradients. The vanishing gradients problem causes the gradients in the layers closest to the inputs to be very small so that the training in those layers is hindered. The residual connections for deep learning models were popularized by the paper De ...

Introduction to Natural Language Processing

Understanding TensorFlow 2

Word2vec: Learning Word Embeddings

Advanced Word Vector Algorithms

Sentence Classification with Convolutional Neural Networks

Recurrent Neural Networks

Understanding Long Short-Term Memory Networks

Applications of LSTM: Generating Text

Sequence-to-Sequence Learning: Neural Machine Translation

Transformers

Sarcasm Classification Using BERT

Image Captioning with Transformers

Caption Generation Using PyTorch

Final Remarks

Appendix: Mathematical Foundations and Advanced TensorFlow

Transformer Architecture: Residuals and Normalization

Residual connections