GPT Models
Learn about the GPT model and its successors.
We'll cover the following
OpenAI is an AI research group that has been in the spotlight for quite some time because of its newsworthy works, such as GPT, GPT-2, and the recently released GPT-3.
Generative pretraining
In this section, we will discuss these architectures and their novel contributions briefly. Toward the end, we'll use a pretrained version of GPT-2 for our text generation task.
GPT
The first model in this series is called GPT, or Generative Pre-trained Transformer. It was released in 2018, about the same time as the BERT model. The
GPT is essentially a language model based on the transformer-decoder model, we presented in the previous chapter (see the lesson on Transformers). Since a language model can be trained in an unsupervised fashion, the authors of this model used this unsupervised approach to train on a very large corpus and then fine-tuned it for specific tasks. The authors used the BookCorpus
Get hands-on with 1400+ tech skills courses.