We will run three models in the three sections of this lesson:

  • Embedding2ML: To use GPT-3 to provide embeddings for ML algorithms.

  • Instruct series: To ask GPT-3 to provide instructions for any task.

  • Content filter: To filter bias or any form of unacceptable input and output.

We will begin with embedding2ML (embeddings as an input to ML).

Embedding2ML

OpenAI has trained several embedding models with different dimensions with different capabilities:

  • Ada (1,024 dimensions)

  • Babbage (2,048 dimensions)

  • Curie (4,096 dimensions)

  • Davinci (12,288 dimensions)

For more explanations on each engine, you will find more information on OpenAI’s website.

The Davinci model offers embedding with 12,288 dimensions. In this section, we will use the power of Davinci to generate the embeddings of a supply chain dataset. However, we will not send the embeddings to the embedding sublayer of the transformer!

We will send the embeddings to a clustering machine learning program from the scikit-learn library in six steps:

  1. Importing OpenAI, and entering the API key.

  2. Loading the dataset.

  3. Combining the columns.

  4. Running the GPT-3 embedding.

  5. Clustering (k-means) with the embeddings.

  6. Visualizing the clusters (t-SNE).

The process is summed up in the figure below:

Get hands-on with 1200+ tech skills courses.