...

/

Build a Transformer Encoder

Build a Transformer Encoder

Implement your own transformer in Pytorch.

It’s finally time to apply all we have learned about Transformers. The best way to do it is to build a Transformer Encoder from scratch. We will start by developing all the subcomponents and in the end, we will combine them to form the encoder. Let’s start with something simple.

Disclaimer: Pytorch has its own built-in Transformer and attention modules. However, we believe that you can get a solid understanding only if you develop them yourself.

Linear layers

A good first step is to build the linear subcomponent. A 2-layered feedforward network followed by dropout is good enough. So here is what the forward pass should look like:

  1. Linear Layer
  2. RELU as an activation function
  3. Dropout
  4. 2nd Linear layer

You can implement this yourself. Jump to the code below, and finish the FeedForward module. Note that this is not an exercise. It is only intended to solidify your understanding by revisiting how to build simple Pytorch modules.

Press + to interact
import torch
import torch.nn as nn
import torch.nn.functional as F
class FeedForward(nn.Module):
def __init__(self, d_model: int, d_ff: int, dropout: float = 0.1):
"""
Args:
`d_model`: model dimension
`d_ff`: hidden dimension of feed forward layer
`dropout`: ropout rate, default 0.1
"""
super(FeedForward, self).__init__()
## 1. DEFINE 2 LINEAR LAYERS AND DROPOUT HERE
def forward(self, x: torch.FloatTensor) -> torch.FloatTensor:
"""
Args:
`x`: shape (batch_size, max_len, d_model)
Returns:
same shape as input x
"""
## 2. RETURN THE FORWARD PASS
FeedForward(10, 100, 0.1)

Layer normalization

As a next step, let’s define a layer normalization module. To keep you from being overwhelmed, you will be provided the code for this one. Remember that a layer normalization module is defined as:

LN(x)=α(xμ(x ...