GAN Algorithms and Loss Functions
Discover GAN algorithms and loss functions like LSGAN and WGAN.
We'll cover the following
Similar to tricks for training neural networks, there are a few sources that provide best practices for training generative adversarial networks. These best practices were mainly developed to circumvent the difficulty in training GANs using the objective function. Note that these tricks might not apply nor be necessary to other GAN formulations such as Least Squares GAN (LSGAN) or Wasserstein GAN (WGAN).
Some of the problems associated with the original GAN objective function seem to have been addressed with the development of relativistic loss functions like the Least Squares GAN and the Wasserstein GAN.
We present these different algorithms and loss functions, recommending that you study them in tandem with Google’s recent research in “Are GANs Created Equal.” In this paper, while referring to different GAN loss functions and algorithms, the authors claim that they did not find evidence that any of the tested algorithms consistently outperform the original.
Naturally, the claims made in the paper “Are GANs Created Equal” only apply to the experimental setup in the paper. In other experimental setups, it is possible that a loss function and algorithm consistently outperform the standard GAN.
Nonetheless, the paper shows evidence that GANs are very sensitive to hyperparameter initialization. Therefore, we suggest you do not prematurely switch between GAN frameworks and losses: instead, start from hyperparameters that are known to work and only then do a hyperparameter search.
Lastly, all these suggestions operate under the not-always-true assumption that the code we are running has no bugs. Therefore, we suggest you first ensure that this assumption holds and that there are no bugs in the code before prematurely looking into other areas of improvement.
Least squares GAN
The Least Squares GAN (LSGAN) uses the least squares objective function to train the discriminator and generator. Unlike the sigmoid cross entropy, the least squares loss more heavily penalizes samples regarding their position with respect to the decision boundary. In their paper, the authors affirm that the LSGAN contributes to the stability of the learning process, removes the need to use batch normalization, and converges faster than the Wasserstein GAN.
The objective functions of LSGAN for the discriminator
Get hands-on with 1400+ tech skills courses.