What Can GANs Do?

GANs can do a lot more than generating sine signals. We can apply GANs to address many different practical problems by altering the input and output dimensions of the generator and combining them with other methods. For example, we can generate text and audio (1-dimension), images (2-dimension), video, and 3D models (3-dimension) based on random input.

We can perform denoising and translation on these data types if we keep the same input and output dimensions. We can feed real data into the generator and let it output data with larger dimensions, for example, image super-resolution. We can also feed one type of data and let it give another type of data, for example, generate audio based on text, generate images based on text, and so on.

Even though it has only been a few years since GANs first came out, people have kept working on improving GANs, and new GAN models are coming out almost weekly. If we look at thishttps://github.com/hindupuravinash/the-gan-zoo resource, we can see that there have been at least 500 different GAN models. It’s nearly impossible for us to learn and evaluate each of them. It is common to find several models with the same name. Therefore, we won’t try to introduce most GAN models in this course. However, we will help get familiar with the most typical GAN models in different applications and learn how to use them to address practical problems.

We will also introduce some useful tricks and techniques to improve the performance of GANs. We hope that, by the time this course finishes, we have a wide yet in-depth understanding of the mechanisms of various GAN models so that we will feel confident to design our own GANs to creatively solve the problems we may encounter in the future. Let’s look at what GANs are capable of and their advantages compared to traditional approaches in these fields: Image processing, NLP, and 3D modeling.

Image processing

In the field of image processing, GANs are applied to many applications, including image synthesis, image translation, and image restoration. These topics are the most common in the study and application of GANs and make up most of the content in this book. Images are one of the easiest to show and spread media forms on the internet; therefore, any latest breakthrough in the image-wise application of GANs would receive overwhelming attention in the deep learning community.

Image synthesis

Image synthesis is, in short, the creation of new images. Early in 2015, DCGANs (Deep Convolutional Generative Adversarial Networks) came out. It was one of the first well-performing and stable approaches to address the hard-to-train issues presented in earlier GAN models. It generates 64×6464 \times 64 images based on a random vector with a length of 100. Some images generated by DCGANs are shown in the following screenshot. We may notice that some of the images are far from being realistic because of the blocky appearance of the pixels. In the paperRadford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015)., the authors present many interesting and inspiring visual experiments and reveal even more potential of GANs.

Get hands-on with 1200+ tech skills courses.