Domains of Generative AI
Learn about the various domains where generative AI is making an impact.
We'll cover the following
In recent years, generative AI has made significant advancements and has expanded its applications to a wide range of domains, such as art, music, fashion, architecture, and many more. In some of them, it is indeed transforming the way we create, design, and understand the world around us. In others, it is improving and making existing processes and operations more efficient.
The fact that generative AI is used in many domains also implies that its models can deal with different kinds of data, from natural language to audio or images. Let's understand how generative AI models address different types of data and domains.
Text generation
One of the greatest applications of generative AI—and the one we are going to cover the most throughout this course—is its capability to produce new content in natural language. Generative AI algorithms can be used to generate new text, such as articles, poetry, and product descriptions.
For example, a language model such as GPT-3, developed by OpenAI, can be trained on large amounts of text data and then used to generate new, coherent, and grammatically correct text in different languages (both in terms of input and output), as well as extracting relevant features from text such as keywords, topics, or full summaries.
Here is an example of working with GPT-3:
Next, we will move on to image generation.
Image generation
One of the earliest and most well-known examples of generative AI in image synthesis is the generative adversarial network (GAN) architecture. This was introduced in the
Here is an example of faces of people who do not exist since they are entirely generated by AI StyleGAN2:
Then, in 2021, a new generative AI, the DALL-E model, was introduced in this field by OpenAI. Different from GANs, the DALL-E model is designed to generate images from descriptions in natural language (GANs take a random noise vector as input) and can generate a wide range of images, which may not look realistic but still depict the desired concepts.
DALL-E has great potential in creative industries such as advertising, product design, and fashion, among others, to create unique and creative images.
Following is an example of DALL-E generating four images starting from a request in natural language:
Note that text and image generation can be combined to produce brand new materials. In recent years, widespread new AI tools have used this combination.
An example is Tome AI, a generative storytelling format that, among its capabilities, is also able to create slide shows from scratch, leveraging models such as DALL-E and GPT-3.
As we can see, the preceding AI tool was perfectly able to create a draft presentation just based on our short input in natural language.
Music generation
The first approaches to generative AI for music generation trace back to the 50s, with research in the field of algorithmic composition, a technique that uses algorithms to generate musical compositions. In fact, in 1957, Lejaren Hiller and Leonard Isaacson created the Illiac Suite for String Quartet, the first piece of music entirely composed by AI. Since then, the field of generative AI for music has been the subject of ongoing research for several decades. Among recent years’ developments, new architectures and frameworks have become widespread among the general public, such as the WaveNet architecture introduced by Google in 2016, which has been able to generate high-quality audio samples, or the Magenta Project, also developed by Google, which uses recurrent neural networks (RNNs) and other ML techniques to generate music and other forms of art. Then, in 2020, OpenAI also announced Jukebox, a neural network that generates music, with the possibility to customize the output in terms of musical and vocal style, genre, reference artist, and so on.
Those and other frameworks became the foundations of many AI composer assistants for music generation. An example is Flow Machines, developed by Sony CSL Research. This generative AI system was trained on a large database of musical pieces to create new music in a variety of styles. It was used by French composer Benoît Carré to compose an album called Hello World, which features collaborations with several human musicians.
In the below image, we can see an example of a track generated entirely by Music Transformer, one of the models within the Magenta Project:
Another incredible application of generative AI within the music domain is speech synthesis. It is indeed possible to find many AI tools that can create audio based on text inputs in the voices of well-known singers.
For example, if you have always wondered how your songs would sound if Kanye West performed them, well, you can now fulfill your dreams with tools such as FakeYou.com or UberDuck.ai.
Next, we move on to explore generative AI for videos.
Video generation
Generative AI for video generation shares a similar timeline of development with image generation. In fact, one of the key developments in the field of video generation has been the development of GANs. Thanks to their accuracy in producing realistic images, researchers have started to apply these techniques to video generation as well. One of the most notable examples of GAN-based video generation is DeepMind’s Motion to Video, which generates high-quality videos from a single image and a sequence of motions. Another great example is NVIDIA’s Video-to-Video Synthesis (Vid2Vid) DL-based framework, which uses GANs to synthesize high-quality videos from input videos.
The Vid2Vid system can generate temporally consistent videos, meaning that they maintain smooth and realistic motion over time. The technology can be used to perform a variety of video synthesis tasks, such as the following:
Converting videos from one domain into another (for example, converting a daytime video into a nighttime video or a sketch into a realistic image)
Modifying existing videos (for example, changing the style or appearance of objects in a video)
Creating new videos from static images (for example, animating a sequence of still images)
In September 2022, Meta’s researchers announced the general availability of Make-A-Video, a new AI system that allows users to convert their natural language prompts into video clips. Behind such technology, we can recognize the capabilities of various models we have discussed: language understanding for the prompt, image and motion generation with image generation, and background music made by AI composers.
Overall, generative AI has impacted many domains for years, and some AI tools already consistently support artists, organizations, and general users. The future seems very promising; however, before jumping to the ultimate models available on the market today, we first need to have a deeper understanding of the roots of generative AI, its research history, and the recent developments that eventually lead to the current OpenAI models.