As done previously in this course, we will download the 345M-parameter GPT-2 transformer model and interact with it. We will enter context sentences and analyze the text generated by the transformer. The goal is to see how it creates new content.

Run the notebook cell by cell. The process is tedious, but the result produced by the cloned OpenAI GPT-2 repository is gratifying. We saw that we could run a GPT-3 engine in a few lines. But this appendix gives you the opportunity, even if the code is not optimized anymore, to see how GPT-2 models work.

Hugging Face has a wrapper that encapsulates GPT-2 models. It’s useful as an alternative to the OpenAI API. However, the goal in this appendix is not to avoid the complexity of the underlying components of a GPT-2 model but to explore it!

Finally, it is important to stress that we are running a low-level GPT-2 model and not a one-line call to obtain a result. That is why we are avoiding prepackaged versions (the OpenAI GPT-3 API, Hugging Face wrappers, others). We are getting our hands dirty to understand the architecture of GPT-2 from scratch. As a result, you might get some deprecation messages. However, the effort is worthwhile to become an Industry 4.0 AI expert.

Let’s start off by importing TensorFlow and checking its version.

Step 1: Checking the version of TensorFlow

The GPT-2 345M transformer model provided by OpenAI uses TensorFlow 2.x. This will lead to several warnings when running the program. However, we will ignore them and run at full speed on the thin ice of training GPT models ourselves with our modest machines.

In the 2020s, GPT models have reached 175 billion parameters, making it impossible for us to train them ourselves efficiently without having access to a supercomputer. The number of parameters will only continue to increase.

The corporate giants’ research labs, such as Facebook AI, OpenAI, and Google Research/Brain are speeding toward super-transformers and are leaving what they can for us to learn and understand. But, unfortunately, they do not have time to go back and update all the models they share. However, we still have this notebook!

TensorFlow 2.x is the latest TensorFlow version. However, older programs can still be helpful.

We will be using TensorFlow 2.x in this notebook:

Get hands-on with 1200+ tech skills courses.