Applications of AI
Learn about the different uses of generative AI.
We'll cover the following
In New York City in October 2018, the international auction house Christie’s sold the Portrait of Edmond
Portraiture (which is the process of painting a picture or taking a photograph of a person) is far from the only area in which machine learning has demonstrated astonishing results. Indeed, if you paid attention to the news in the last few years, you would have likely seen many stories about the ground-breaking results of modern AI systems applied to diverse problems, from the hard sciences to digital art. Deep neural network models, such as the one created by Obvious, can now classify X-ray images of human anatomy on the level of trained
Discriminative and generative models
These other examples of AI differ in an important way from the model that generated the Portrait of Edmond Belamy. In all of these other applications, the model is presented with a set of inputs—data such as English text, images from X-rays, or the positions on a gameboard—that is paired with a target output, such as the next word in a translated sentence, the diagnostic classification of an X-ray, or the next move in a game. Indeed, this is probably the kind of AI model you are most familiar with from prior experiences of predictive modeling—a statistical technique to predict future behavior; they are broadly known as discriminative models, whose purpose is to create a mapping between a set of input variables and a target output. The target output could be a set of discrete classes (such as which word in the English language appears next in a translation) or a continuous outcome (such as the expected amount of money a customer will spend in an online store over the next 12 months).
It should be noted that this kind of model, in which data is labeled or scored, represents only half the capabilities of modern machine learning. Another class of algorithms, such as the one that generated the artificial portrait sold at Christie’s, doesn’t compute a score or label from input variables but rather generates new data. Unlike discriminative models, the input variables are often vectors of numbers that aren’t related to real-world values at all and are often even randomly generated. This kind of model—known as a generative model—can produce complex outputs such as text, music, or images from random noise and is the topic of this course.
Even if you didn’t know it at the time, you have probably seen other instances of generative models in the news alongside the discriminative examples given earlier. A prominent example is deepfakes, which are videos in which one person’s face has been systematically replaced with another’s by using a neural network to remap the
Let’s look at an example of a deepfake
You may also have seen stories about AI models that generate fake news. Scientists at the firm OpenAI were initially terrified to release these models to the public due to concerns they could be used to create propaganda and misinformation
Here is an example of a fake conversation generated using
In these and other applications, such as Google’s voice assistant Duplex, which can make a restaurant reservation by dynamically creating a conversation with a human in
Here are some examples of style transfer using generative adversarial networks (
These models are able to handle complex information in a variety of domains, including creating photorealistic images or stylistic filters on pictures, synthetic sound, conversational text, and even rules for optimally playing video games.
Implementing generative models
While generative models could theoretically be implemented using a wide variety of machine learning algorithms, in practice, they are usually built with deep neural networks, which are well suited to capturing complex variations in data such as images or language.
We’ll focus on implementing these deep generative models for many different applications using TensorFlow 2. TensorFlow 2 is a C++ framework with APIs in the Python programming language used to develop and produce deep learning models. It was open-sourced by Google in 2013 and has become one of the most popular libraries for the research and deployment of neural network models.
With the 2.0 release, much of the boilerplate code—sections of code that are repeated in multiple places with little to no variation—that characterized development in earlier versions of the library was cleaned up with high-level abstractions, allowing us to focus on the model rather than the plumbing of the computations. The latest version also introduced the concept of eager execution, allowing the network computations to be run on demand, which will be an important benefit of implementing some of our models.
Indeed, as we will describe in more detail later, the surge of research into deep learning using large neural network models since 2006 has produced a wide variety of generative modeling applications The first of these was the restricted Boltzmann machine, which is stacked in multiple layers to create a deep belief network. Later innovations included variational autoencoders (VAEs), which can efficiently generate complex data samples from random numbers.
We’ll also cover in detail the algorithm used to create the Portrait of Edmond Belamy, the GAN. Conceptually, the GAN model creates a competition between two neural networks.
One, termed the generator, produces realistic (or, in the case of the experiments by Obvious, artistic) images starting from a set of random numbers and applying a mathematical transformation.
The second network, known as the discriminator, attempts to classify whether a picture comes from a set of real-world images or whether it was created by the generator.
In a sense, the generator is like an art student, producing new paintings from brushstrokes and creative inspiration, while the discriminator acts like a teacher, grading whether the student has produced work comparable to the paintings they are attempting to mimic. As the generator becomes better at fooling the discriminator, its output becomes closer and closer to the historical examples it is designed to copy.
Another key innovation in generative models is in the domain of natural language data. By representing the complex interrelationship between words in a sentence in a computationally scalable way, the transformer network and the Bidirectional Encoder from Transformers (BERT) model built on top of it present powerful building blocks to generate textual data in applications such as chatbots.
We’ll also learn how models such as GANs and VAEs can be used to generate not just images or text but sets of rules that allow game-playing networks developed with reinforcement learning algorithms to process and navigate their environment more efficiently—in essence, learning to learn. Generative models are a huge field of research that is constantly growing, so we’ll cover some topics one by one.