The improved GANs we have covered so far were mostly focused on architectural enhancements to improve results. Two major issues with the GAN setup are the stability of the minimax game and the unintuitiveness of the generator loss. These issues arise because we train the discriminator and generator networks alternatingly, and at any given moment, the generator loss is indicative of the discriminator’s performance so far.
Wasserstein GAN vs. GAN
Wasserstein GANArjovsky, Martin, Soumith Chintala, and Léon Bottou. 2017. “Wasserstein GAN.” ArXiv.org. 2017. https://arxiv.org/abs/1701.07875. (or WGAN) was an attempt by Arjovsky et al. to overcome some of the issues with the GAN setup. This is one of a few deep learning papers that are deeply rooted in theoretical foundations to explain the impact of their work (apart from empirical results). The main difference between typical GANs and WGANs is the fact that WGANs treat the discriminator as a critic (deriving from reinforcement learning). Hence, instead of simply classifying input images as real or fake, the WGAN discriminator (or critic) generates a score to inform the generator about the realness or fakeness of the input image.
The maximum likelihood approach explained the task as one where we try to minimize the divergence between pz. In addition to being asymmetric, KL divergence has issues when the distributions are too far apart or completely disjointed. To overcome these issues, WGANs use Earth Mover’s (EM) distance or Wasserstein distance. Simply put, Earth Mover’s distance is the minimum cost to move or transport mass from distribution p to q. For the GAN setup, we can imagine this as the minimum cost of moving from the generator distribution (pz) to the real distribution (pdata). Mathematically, this can be stated as the infimum (or greatest lower bound, denoted as inf) for any transport plan (denoted as W(source, destination), that is: