Introduction: Transformers

Get an overview of the transformers model.

We'll cover the following

Transformer models changed the playing field for most machine learning problems that involve sequential data. They have advanced the state of the art by a significant margin compared to the previous leaders, RNN-based models. One of the primary reasons that the transformer model is so performant is that it has access to the whole sequence of items (e.g., sequence of tokens), as opposed to RNN-based models, which look at one item at a time. The term “transformer” has come up several times in our conversations as a method that has outperformed other sequential models, such as LSTMs and GRUs. Now, we’ll learn more about transformer models.

Chapter overview

We’ll first learn about the transformer model in detail. Then, we’ll discuss the details of a specific model from the transformer family known as Bidirectional Encoder Representations from Transformers (BERT). We’ll see how we can use this model to complete a question-answering task.

Specifically, we’ll cover the following main topics:

  • Transformer architecture

  • Understanding BERT

  • Using BERT to answer questions

Get hands-on with 1400+ tech skills courses.