Progressive GAN

Understand the workings of the progressive GAN and how it can be implemented using TensorFlow 2.0.

GANs are powerful systems to generate high-quality samples, examples of which we have seen in the previous sections. Different works have utilized this adversarial setup to generate samples from different distributions like CIFAR-10, celeb_a, LSUN-bedrooms, and so on (we covered examples using MNIST for explanation purposes). Some works, like Lap-GANs, have focused on generating higher-resolution output samples, but they lacked perceived output quality and introduced additional challenges in training. Progressive GANs or Pro-GANs or PG-GANs were presented by Karras et al. in their work titled “GANs for Improved Quality, Stability, and VariationKarras, Tero, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. “Progressive Growing of GANs for Improved Quality, Stability, and Variation.” ArXiv.org. 2017. https://arxiv.org/abs/1710.10196. ." at ICLR-2018, as a highly effective method for generating high-quality samples.

The method presented in this work not only mitigated many of the challenges present in earlier works but also brought about a very simple solution to crack this problem of generating high-quality output samples. The paper also presented a number of very impactful contributions, some of which we'll cover in detail in the following subsections.

The overall method

The software engineering way of solving tough technical problems is often to break them down into simpler granular tasks. Pro-GANs also target the complex problem of generating high-resolution samples by breaking down the task into smaller and simpler problems to solve. The major issue with high-resolution images is the huge number of modes or details such images have. It makes it very easy to differentiate between generated samples and the real data (perceived quality issues). This inherently makes the task of building a generator with enough capacity to train well on such datasets, along with memory requirements, a very tough one.

To tackle these issues, Karras et al. presented a method to grow both generator and discriminator models as the training progresses from lower to higher resolutions graduallyKarras, Tero, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. “Progressive Growing of GANs for Improved Quality, Stability, and Variation.” ArXiv.org. 2017. https://arxiv.org/abs/1710.10196. (shown in the figure below). They noted that this progressive growth of models has various advantages, such as the ability to generate high-quality samples, faster training, and lesser memory requirements (compared to directly training a GAN to generate high-resolution output).

Press + to interact
Progressively increasing the resolution for discriminator and generator models
Progressively increasing the resolution for discriminator and generator models

Generating higher-resolution images step by step is not an entirely new idea. Many prior works used similar techniques, but the method used in this proposed system was most similar to the layer-wise training of autoencodersBengio, Yoshua, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. n.d. “Greedy Layer-Wise Training of Deep Networks.” https://proceedings.neurips.cc/paper_files/paper/2006/file/5da713a690c067105aeb2fae32403405-Paper.pdf..

The system learns by starting with lower-resolution samples and a generator-discriminator set up as mirror images of each other (architecture-wise). At a lower resolution (say 44 x 44), the training is much simpler and stable as there are fewer modes to learn. We then increase the resolution step by step by introducing additional layers for both models. This step-by-step increase in resolution limits the complexity of the task at hand rather than forcing the generator to learn all modes at once. This finally enables Pro-GANs to generate megapixel-size outputs, which all start from a very low-resolution initial point.

Despite improvements, the training time and compute requirements for Pro-GANs are huge. To generate said megapixel outputs, multiple GPUs require a training time of up to a week.

In the following subsections, we’ll explore the important contributions and implementation-level details. Keeping the requirements in check, we’ll cover component-level details but use TensorFlow Hub to present the trained model (instead of training one from scratch). This will enable us to focus on the important details and leverage pre-built blocks as required.

Progressive growth-smooth fade-in

Pro-GANs were introduced as networks that increase the resolution step by step by adding additional layers to generator and discriminator models. But how does that actually work? The following is a step-by-step explanation:

  • The generator and discriminator models start with a resolution of 4 ...