Build a Transformer Encoder
Implement your own transformer in Pytorch.
We'll cover the following...
It’s finally time to apply all we have learned about Transformers. The best way to do it is to build a Transformer Encoder from scratch. We will start by developing all the subcomponents and in the end, we will combine them to form the encoder. Let’s start with something simple.
Disclaimer: Pytorch has its own built-in Transformer and attention modules. However, we believe that you can get a solid understanding only if you develop them yourself.
Linear layers
A good first step is to build the linear subcomponent. A 2-layered feedforward network followed by dropout is good enough. So here is what the forward pass should look like:
- Linear Layer
- RELU as an activation function
- Dropout
- 2nd Linear layer
You can implement this yourself. Jump to the code below, and finish the FeedForward
module. Note that this is not an exercise. It is only intended to solidify your understanding by revisiting how to build simple Pytorch modules.
import torchimport torch.nn as nnimport torch.nn.functional as Fclass FeedForward(nn.Module):def __init__(self, d_model: int, d_ff: int, dropout: float = 0.1):"""Args:`d_model`: model dimension`d_ff`: hidden dimension of feed forward layer`dropout`: ropout rate, default 0.1"""super(FeedForward, self).__init__()## 1. DEFINE 2 LINEAR LAYERS AND DROPOUT HEREdef forward(self, x: torch.FloatTensor) -> torch.FloatTensor:"""Args:`x`: shape (batch_size, max_len, d_model)Returns:same shape as input x"""## 2. RETURN THE FORWARD PASSFeedForward(10, 100, 0.1)
Layer normalization
As a next step, let’s define a layer normalization module. To keep you from being overwhelmed, you will be provided the code for this one. Remember that a layer normalization module is defined as:
...