Transformers and Industry 4.0
Learn about the paradigm change and the role of prompt engineering in Industry 4.0.
First, let's explore the ecosystem of transformers.
The ecosystem of transformers
Transformer models represent such a paradigm change that they require a new name to describe them: foundation models. Accordingly, Stanford University created the Center for Research on Foundation Models (CRFM). In August 2021, the CRFM published a two-hundred-page paper written by over one hundred scientists and professionals called
Foundation models were not created by academia but by the big tech industry. For example, Google invented the transformer model, which led to Google BERT. Microsoft entered a partnership with OpenAI to produce GPT-3.
Big tech had to find a better model to face the petabytes of data flowing into their data centers as the volume of data exponentially increased. Transformers were thus born out of necessity.
Let’s first take Industry 4.0 into consideration to understand the need to have industrialized artificial intelligence models.
Industry 4.0
The Agricultural Revolution led to the First Industrial Revolution, which introduced machinery. The Second Industrial Revolution introduced electricity, the telephone, and airplanes. The Third Industrial Revolution was digital.
The Fourth Industrial Revolution, or Industry 4.0, birthed an unlimited number of machine-to-machine connections: bots, robots, connected devices, autonomous cars, smartphones, bots that collect data from social media storage, and more.
In turn, these millions of machines and bots generate billions of data records every day—images, sound, words, and events, as shown in the figure below:
Industry 4.0 led to the need for intelligent algorithms that process data and make decisions without human intervention on a large scale to face this amount of data, previously unseen in the history of humanity.
Big tech needed to find a single AI model that could perform a variety of tasks that required several separate algorithms in the past.
Foundation Models
Transformers have two distinct features: a high level of homogenization and mind-blowing emergence properties. Homogenization makes it possible to use one model to perform a wide variety of tasks. These abilities emerge through training billion-parameter models on supercomputers.
The paradigm change makes foundation models a post-deep learning ecosystem, as shown in the figure below:
Foundation models, although designed with an innovative architecture, are built on top of the history of AI. As a result, an artificial intelligence specialist’s range of skills is stretching!
The present ecosystem of transformer models is unlike any other evolution in artificial intelligence and can be summed up with four properties:
Model architecture: The model is industrial. The layers of the model are identical, and they are specifically designed for parallel processing.
Data: Big tech possesses the largest data source in the history of humanity, first generated during the Third Industrial Revolution (digital) and boosted to unfathomable data sizes of Industry 4.0.
Computing power: Big tech possesses computer power never seen before at that scale. For example, GPT-3 was trained at about 50
/second, and Google now has domain-specific supercomputers that exceed 80 PetaFLOPS/second.PetaFLOPS Here, FLOPS stands for floating-point operations per second. Prompt engineering: Highly trained transformers can be triggered to do a task with a prompt. The prompt is entered in natural language. However, the words used require some structure, making prompts a metalanguage.
A foundation model is therefore a transformer model that has been trained on supercomputers with billions of records of data and billions of parameters. The model can then perform a wide range of tasks with no further fine-tuning. The scale of foundation models is unique. These fully trained models are often called engines. Only GPT-3, Google BERT, and a handful of transformer engines can qualify as foundation models.
Note: We will only refer to foundation models in this course when mentioning OpenAI’s GPT-3 or Google’s BERT model. This is because GPT-3 and Google BERT were fully trained on supercomputers. Though interesting and effective for limited use, other models do not reach the homogenization level of foundation models due to the lack of resources.
Now let's explore an example of how foundation models work and have changed the way we develop programs.
Is programming becoming a subdomain of NLP?
Chen et al. (2021) published a paper named "
Points to ponder:
Is programming now a translation task from natural language to source code languages?
Is programming becoming an NLP task for GPT-3 engines?
Let’s look into an example before answering those questions.
Bear in mind that Codex is a stochastic algorithm, so the metalanguage is tricky. We might not generate what we expect if we are not careful to engineer the prompt correctly.
Some prompts were created while experimenting with Codex. This example is just to give an idea of how Codex works and is purely for educational purposes. The prompts were:
“generate a random distribution of 200 integers between 1 and 100”
“plot the data using matplotlib”
“create a k-means clustering model with 3 centroids and fit the model”
“print the cluster labels”
“plot the clusters”
“plot the clusters with centroids”
Codex translated our natural metalanguage prompts into Python automatically!
Note: Codex is a stochastic model, so it might not reproduce exactly the same code if you try again. You will have to learn the metalanguage through experimentation to be able to produce specific results.
The Python program is generated automatically and can be copied and tested:
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.cluster import KMeansfrom sklearn.datasets.samples_generator import make_blobs#Generate random datanp.random.seed(0)X, y = make_blobs(n_samples=200, centers=3, n_features=2, cluster_std=2,random_state=0)#Plot the dataplt.scatter(X[:, 0], X[:, 1], s=50)plt.show()
The code for the clustered data is shown below:
#Create the k-means modelkmeans = KMeans(n_clusters=3, random_state=0)#Fit the model to the datakmeans.fit(X)#Print the cluster labelsprint(kmeans.labels_)#Plot the clustersplt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='rainbow')plt.show()
In addition, the code snippet for plotting the clusters with its centroids is given below:
# plot the clusters with centroidsplt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='rainbow')plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],c='black', s=100, alpha=0.5)plt.show()
GitHub Copilot is now available with some Microsoft developing tools. If you learn the prompt engineering metalanguage, you will reduce your development time in years to come. End users can create prototypes and or small tasks if they master the metalanguage. In the future, coding copilots will expand. We'll discuss Codex and how it fits into the future of artifical intelligence in more depth later in this course.
At this point, let’s take a glimpse into the bright future of artificial intelligence specialists.
The future of artificial intelligence specialists
The societal impact of foundation models should not be underestimated. Prompt engineering has become a skill required for artificial intelligence specialists. However, the future of AI specialists cannot be limited to transformers. AI and data science overlap in I4.0.
An AI specialist will be involved in machine to machine algorithms using classical AI, IoT, edge computing, and more. An AI specialist will also design and develop fascinating connections between bots, robots, servers, and all types of connected devices using classical algorithms.
This course is therefore not limited to prompt engineering but covers a wide range of design skills required to be an “Industry 4.0 artificial intelligence specialist” or “I4.0 AI specialist.”
Prompt engineering is a subset of the design skills an AI specialist will have to develop. In this course, we will therefore refer to the future AI specialist as an “Industry 4.0 artificial intelligence specialist.”