Challenges in Training GANs

Understand the difficulties associated with training GANs, such as mode collapse, instability, hyperparameter sensitivity, and vanishing gradients, and how to mitigate these issues.

The challenges mentioned in this section relate to training GANs, which can have the following major problems:

  • Mode collapse and mode drop

  • Training instability

  • Sensitivity to hyperparameter initialization

  • Vanishing gradients

Let’s move on and address each of these problems individually.

Mode collapse and mode drop

Mode collapse and mode drop are common problems in GANs. They refer to reduced variety in the samples produced by a generator. The term has been used interchangeably and has the following two main interpretations:

  • The generator synthesizes samples with intra-mode variety, but some modes are missing.

  • The generator synthesizes samples with inter-mode variety, but each mode lacks variety.

Note that both the first and second interpretations assume that a model has high precision and low recall. The first interpretation refers to low inter-mode recall, that is, when there is a variety in the modes present in the synthesized samples but not all modes are present. The second interpretation refers to low intra-mode recall, that is, all modes are present, but within each mode, the synthesized samples lack variety.

Research and hacks into how to circumvent mode collapse and mode drop in GANs have been developed in recent years. Common hacks to circumvent these problems include training one generator per mode of the distribution or using a weighted sampling scheme to sample the real data, such that modes that appear infrequently in the samples produced by the generator are sampled more often.

There are also solutions that consist of using an objective function that is different from the original GAN objective, which compares distributions from real and fake data with a different distance. The Least Squares GAN uses a least squares distance, the Wasserstein GAN uses the Wasserstein distance, energy-based GANs use total-variation distance, and the Relativistic GAN uses a log-odds distance. We will address some of these objective functions later on in this chapter.

In addition, there is a solution that consists of updating the discriminator for more iterations than the generator. This solution is based on the belief that, by doing so, the discriminator has access to a larger pool of samples, meaning the generator will no longer overfit to the current sample. It is also believed that by doing multiple discriminator updates, the generator has indirect access to the trajectory of future discriminator updates.

Note that these strategies are substantially empirical, and their success is most commonly the result of trial and error.

Training instability

Training instability refers to the weight updates that occur in the GAN optimization process. It is believed that a few factors contribute to instability in weight updates, including:

  • Sparse gradients

  • Disjoint support between fake images and real images

Nonlinearities such as ReLU and max pooling produce sparse gradients that can make training unstable. We will propose solutions in this chapter on how to avoid sparse gradients in GANs. For now, let’s investigate disjoint support.

In the original GAN setup, we optimize the discriminator by learning a decision boundary that separates real data from fake data. If the support of the fake images does not overlap with the real images, the discriminator can perfectly differentiate between what is real and what is fake. This property breaks the assumptions in the GAN loss in multiple ways. To understand this, let’s start by visualizing an image of two nonoverlapping distributions, as provided in the paper “Amortised Map Inference for Image Super-Resolution” by Casper Kaae Sønderby, which proposes a solution to instability in GANs. Take a look at the following diagram Source: Amortised Map Inference for Image Super-Resolution (https:/​/​arxiv.​org/​abs/​1610.​04490):

Get hands-on with 1400+ tech skills courses.