The Evaluation of GANs

Discover techniques for assessing GANs in terms of image quality, variety, and adherence.

The evaluation of GANs is important because it helps us understand what the characteristics of the model we trained are and what we can achieve with it. In this chapter, we will be asking these questions:

  • Do the fake samples have an image quality that is similar to the real samples?

  • Do the fake samples have a variety that is similar to the real samples?

  • Do the fake samples satisfy the specifications of the real samples?

Notice that by asking these questions, we can evaluate our model and specify what we can achieve with it. For example, a model with a low variety in samples but good image quality can be used, whereas a model with relatively bad image quality but a good variety produces noisy data that can be used to regularize another model and help it generalize lower-quality images.

Despite its relative youth, several publications (Arjovsky and Bottou, 2017; Salimans et al., 2016; Zhao et al., 2016; Radford et al., 2015) have investigated the use of the GAN framework for sample generation and unsupervised feature learning.

Unlike other optimization problems in general, where analysis of the empirical risk is a strong indicator of progress, in GANs, the decrease in loss does not always correlate with an increase in image quality (Arjovsky et al., 2017). Therefore, authors still rely on the visual inspection of generated images.

Following the procedure described by Breuleux et al. (2011) and used by Goodfellow et al. (2014), earlier GAN papers evaluated the quality of the fake samples by fitting a Gaussian Parzen window to the fake samples and reporting the log-likelihood of the test set under this distribution.

As mentioned in Goodfellow et al. (2014), this method has some drawbacks, including its high variance and bad performance in high-dimensional spaces. The inception score is another widely adopted evaluation metric that fails to provide systematic guidance on the evaluation of GAN models (Barratt and Sharma, 2018).

Fake samples generated with the GANs (Goodfellow et al., 2014) framework have fooled humans and machines into believing that they are indistinguishable from real samples. Although this might be true for the naked eye and the discriminator fooled by the generator, it is unlikely that fake samples are numerically indistinguishable from real samples.

Broadly speaking, current research trends in generative models are focused on increasing the quality of generated samples or enforcing specifications on generated data, estimating what their efficiency is in fooling humans and machines, and understanding the properties of the samples that are produced by them.

Image quality

One important aspect of GAN evaluation is the image quality of the samples produced by the generator relative to the image quality of real samples. A common concern in training GANs is that the images generated by the generator can be blurry. Another concern is that the images can have checkerboard artifacts.

There are both quantitative and qualitative measures for assessing image quality. Whereas traditional quantitative metrics for image quality focus on measures such as distortion and signal-to-noise ratio, current metrics used in GANs focus on using embeddings obtained from neural networks that have been trained on image classification tasks.

Qualitative measures based on the visual inspection of fake samples can be a quick and dirty mechanism to evaluate image quality. By visualizing a grid of images, we can get the gist of the overall quality of images produced by the generator. This information can be used early on to detect problems with our network.

In the following image, we show fake samples generated with the progressive growing of GANs framework, which we will implement later in the course, trained on rock album cover art.

These samples were generated at an intermediary step of the training process where the images start resembling cover art, but still look blurry:

Get hands-on with 1200+ tech skills courses.