WGAN—Understanding the Wasserstein Distance

Learn what Wasserstein distance is and get to know its benefits.

GANs have been known to be hard to train, especially if we have tried to build one from scratch. In this lesson, we will talk about how to use a better distance measure to improve the training of GANs, namely, the Wasserstein GAN.

The groundwork for Wasserstein GAN Martin Arjovsky, Soumith Chintala, and Léon Bottou in their paper, Wasserstein GAN was also laid in an earlier paperZhang, Hantao. "TOWARDS PRINCIPLED METHODS FOR TRAINING GENERATIVE ADVERSARIAL NETWORKS.". To fully comprehend these papers, fundamental mathematical knowledge in probability theory, measure theory, and functional analysis is required. We will try our best to keep the mathematical formulae to a minimum and help in understanding the concept of WGAN.

Analyzing the problems with vanilla GAN loss

Let’s go over the commonly used loss functions for GANs:

  • Ereal[logD(x)]+Efake[log(1D(x))]\underset{real}{\mathbb E} [\log D(x)] + \underset{fake}{\mathbb E} [\log(1 - D(x))], which is the vanilla form of GAN loss.

  • Efake[log(1D(x))]\underset{fake}{\mathbb E} [\log(1 - D(x))]

  • Efake[logD(x)]\underset{fake}{\mathbb E} [-\log D(x)]

The experimental results have shown that these loss functions work well in several applications. However, let’s dig deep into these functions and see what could go wrong when they don’t work so well.

Step 1: Problems with the first loss function:
Assume that the generator network is trained, and we need to find an optimal discriminator network D. We have the following:

Get hands-on with 1200+ tech skills courses.