Key Concepts of Transformers
Represent text with positional encodings and embedding so it can be passed into a transformer.
Mostly, the inability to fully understand transformers arises due to the confusion around secondary concepts. To prevent this from happening, we will gradually discuss all fundamental concepts and then construct a holistic view of transformers.
With Recurrent Neural Networks (RNN’s), we used to treat sequences sequentially to keep the order of the sentence in place. To satisfy that design, each RNN component (layer) needs the previous (hidden) output. As such, stacked LSTM computations were performed sequentially.
Then, transformers came out.
The fundamental building block of a transformer is self-attention. To begin, we need to get rid of sequential processing, recurrency, and LSTMs.We can do that by simply changing the input representation.
Representing the input sentence
Sets and tokenization
The transformer revolution started with a simple question:
Why don’t we feed the entire input sequence so there are no dependencies between hidden states? That might be cool!
As an example the sentence “Hello I love you”:
Get hands-on with 1300+ tech skills courses.