GPT Models
Learn about the GPT model and its successors.
We'll cover the following...
OpenAI is an AI research group that has been in the spotlight for quite some time because of its newsworthy works, such as GPT, GPT-2, and the recently released GPT-3.
Generative pretraining
In this section, we will discuss these architectures and their novel contributions briefly. Toward the end, we'll use a pretrained version of GPT-2 for our text generation task.
GPT
The first model in this series is called GPT, or Generative Pre-trained Transformer. It was released in 2018, about the same time as the BERT model. The
GPT is essentially a language model based on the transformer-decoder model, we presented in the previous chapter (see the lesson on Transformers). Since a language model can be trained in an unsupervised fashion, the authors of this model used this unsupervised approach to train on a very large corpus and then fine-tuned it for specific tasks. The authors used the BookCorpus
As shown in figure above (a), the GPT model is similar to the original transformer-decoder. The authors use 12 decoder blocks (as opposed to 6 in the original transformer) with 768-dimensional states and 12 self-attention heads each. Since the model uses masked self-attention, it maintains the causal nature of the language model and, hence, can be used for text generation as well. For the rest of the tasks showcased in the figure above (b), essentially, the same pretrained language model is used with minimal task-specific preprocessing of inputs and final task-specific layers/objectives.
GPT-2
GPT was superseded by an even more powerful model called GPT-2. Radford et al. presented the GPT-2 model as part of their work titled “Language Models are Unsupervised Multi-task