Road to ChatGPT: The Math of the Model Behind It

Explore the transition from RNNs to transformers, leading up to GPT-3—a colossal language model with 175 billion parameters, representing a major advancement from GPT-1 and GPT-2.

Since its foundation in 2015, OpenAI has invested in the research and development of the class of models called generative pre-trained transformers (GPT), and they have captured everyone’s attention as being the engine behind ChatGPT.

History

GPT models belong to the architectural framework of transformers introduced in 2017 by Google researchers in the “Attention Is All You Need” paper.

The transformer architecture was introduced to overcome the limitations of traditional recurrent neural networks (RNNs). RNNs were first introduced in the 1980s by researchers at the Los Alamos National Laboratory, but they did not gain much attention until the 1990s. The original idea behind RNNs was that of processing sequential data or time series data, keeping information across time steps.

Indeed, up to that moment in time, the classic artificial neural network (ANN) architecture was that of the feedforward ANN, where the output of each hidden layer is the input of the next one without maintaining information about past layers.

What we’ll learn

To understand the idea behind the transformer, we need to start from its origins. We will discuss the following topics:

  • The structure of RNNs

  • The main limitations of RNNs

  • How those limitations have been overcome with the introduction of new architectural elements, including positional encoding, self-attention, and the feedforward layer

  • How we got to the state-of-the-art models like GPT and ChatGPT

Let’s start with the architecture of transformers’ predecessors.

The structure of RNNs

Let’s imagine we want to predict a house price. If we only had today’s price for it, we could use a feedforward architecture where we apply a non-linear transformation to the input via a hidden layer (with an activation function) and get the forecast of the price for tomorrow. Here is how:

Get hands-on with 1400+ tech skills courses.