Introduction: Transformers
Get an overview of the transformers model.
We'll cover the following
Transformer models changed the playing field for most machine learning problems that involve sequential data. They have advanced the state of the art by a significant margin compared to the previous leaders, RNN-based models. One of the primary reasons that the transformer model is so performant is that it has access to the whole sequence of items (e.g., sequence of tokens), as opposed to RNN-based models, which look at one item at a time. The term “transformer” has come up several times in our conversations as a method that has outperformed other sequential models, such as LSTMs and GRUs. Now, we’ll learn more about transformer models.
Chapter overview
We’ll first learn about the transformer model in detail. Then, we’ll discuss the details of a specific model from the transformer family known as Bidirectional Encoder Representations from Transformers (BERT). We’ll see how we can use this model to complete a question-answering task.
Specifically, we’ll cover the following main topics:
Transformer architecture
Understanding BERT
Using BERT to answer questions
Get hands-on with 1400+ tech skills courses.