GPT Models

Learn about the GPT model and its successors.

OpenAI is an AI research group that has been in the spotlight for quite some time because of its newsworthy works, such as GPT, GPT-2, and the recently released GPT-3.

Generative pretraining

In this section, we will discuss these architectures and their novel contributions briefly. Toward the end, we'll use a pretrained version of GPT-2 for our text generation task.

GPT

The first model in this series is called GPT, or Generative Pre-trained Transformer. It was released in 2018, about the same time as the BERT model. The paperRadford, Alec. 2018. “Improving Language Understanding with Unsupervised Learning.” OpenAI. OpenAI. June 11, 2018. https://openai.com/blog/language-unsupervised/. presents a task-agnostic architecture based on the ideas of transformers and unsupervised learning. The GPT model was shown to beat several benchmarks, such as GLUE and SST-2, though the performance was overtaken by BERT, which was released shortly after this.

GPT is essentially a language model based on the transformer-decoder model, we presented in the previous chapter (see the lesson on Transformers). Since a language model can be trained in an unsupervised fashion, the authors of this model used this unsupervised approach to train on a very large corpus and then fine-tuned it for specific tasks. The authors used the BookCorpus datasetZhu, Yukun, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. “Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books.” ArXiv:1506.06724 [Cs], June. https://arxiv.org/abs/1506.06724., which contains over 7,000 unique, unpublished books across different genres. This dataset, the authors claim, allows the model to learn long-range information due to the presence of long stretches of contiguous text. This is seen to be better than the 1B Word Benchmark dataset used by earlier works, which misses out on long-range information due to shuffled sentences. The overall GPT setup is depicted in the following figure:

Get hands-on with 1400+ tech skills courses.