Learning Position with Positional Encoding

Learn why word order is important while feeding input to the transformer and how to compute word order using positional encoding.

Consider the input sentence 'I am good'. In RNNs, we feed the sentence to the network word by word. That is, first the word 'I' is passed as input, next the word 'am' is passed, and so on. We feed the sentence word by word so that our network understands the sentence completely. But with the transformer network, we don't follow the recurrence mechanism. So, instead of feeding the sentence word by word, we feed all the words in the sentence parallel to the network. Feeding the words in parallel helps in decreasing the training time and also helps in learning the long-term dependency.

Why is word order important?

However, the problem is since we feed the words parallel to the transformer, how will it understand the meaning of the sentence if the word order is not retained? To understand the sentence, the word order (position of the words in the sentence) is important, right? Yes, the word order is very important as it helps to understand the position of each word in a sentence, which in turn helps to understand the meaning of the sentence.

So, we should give some information about the word order to the transformer so that it can understand the sentence. How can we do that? Let's explore this in more detail now.

Sending word order to the transformer

For our given sentence, 'I am good', first, we get the embeddings for each word in our sentence. Let's represent the embedding dimension as dmodeld_{model}. Say the embedding dimension, dmodeld_{model} is 4. Then, our input matrix dimension will be:

Get hands-on with 1300+ tech skills courses.