Introduction: The Rise of Transformers with GPT-3 Engines

Get an overview of what we will cover this chapter.

In 2020, Brown et al. (2020) described the training of an OpenAI GPT-3 model containing 175 billion parameters in the paper "Language Models are Few-Shot Learners"The paper can be accessed at: https://arxiv.org/abs/2005.14165 that learned using huge datasets, such as the 400 billion byte-pair-encoded tokens extracted from Common Crawl data. OpenAI ran the training on a Microsoft Azure supercomputer with 285,00 CPUs and 10,000 GPUs.

Overview of the rise of transformers with GPT-3

The machine intelligence of OpenAI’s GPT-3 engines and their supercomputer led Brown et al. (2020) to zero-shot experiments. The idea was to use a trained model for downstream tasks without further training the parameters. The goal would be for a trained model to go directly into multi-task production with an API that could even perform tasks it wasn’t trained for.

Get hands-on with 1400+ tech skills courses.