Tricks of the Trade
Discover the tricks for training GANs, including failure tracking, label utilization, noise addition, input normalization, and objective function modifications.
In this section, we will provide an extensive list of tricks compiled from the GAN research community, from Soumith Chintala in particular. The following list can help with the notably hard task of training GANs. Compared to MLE approaches that have been implemented and evaluated for decades, and are also much simpler to train, the GAN framework is only a few years old. There are many developments that have been made that improve the evaluation and training procedure.
Tracking failure
It is important to track failure early on to speed up the training process.
The loss of the discriminator goes to
When the loss of the discriminator goes to
Check the norm of the gradients.
When the norm of the gradients is large, it means relatively large weight updates. Although this is expected in the first iteration, the norm should decrease as training proceeds.
Check the variance over time of the discriminator’s loss.
It is expected that the variance of the discriminator’s loss decreases over time and does not have sudden spikes.
Working with labels
Train the discriminator to classify using the labels and the generator to synthesize conditions on the labels.
There is a belief that training the discriminator to classify samples and then providing labels as input to the generator can, in turn, improve GAN training. When successful, this setup also allows us to provide a label to the generator to sample from a specific mode of the distribution.
Working with discrete inputs
Use an embedding layer with discrete inputs.
An embedding layer learns a projection from discrete inputs into a dense vector of fixed size. This has several advantages, including the following:
The network learns the best embedding given the task at hand.
The dimensionality of the discrete input can be reduced.
Learn discrete input upsampling to match image channel size.
It is believed that it is better to keep the embedding dimensionality low and upsample it to match the image channel size at the layer at hand.
Concatenate discrete inputs with image channels.
In theory, concatenating discrete inputs with images over the channel’s dimension and then applying a convolution or dense layer is a superset of adding or multiplying the embedded discrete inputs with the images directly. Therefore, concatenating is theoretically preferred because it has more capacity.
In practice, concatenating can be harder to optimize, so it is preferable to try adding or multiplying discrete inputs with images directly.
Adding noise
It is reported that adding noise to the training procedure can improve training because it makes the support of the real and fake distributions less separable, thereby potentially eliminating vanishing gradients and training instabilities. Take a look at the following list of some of these strategies, and also refer to the
Get hands-on with 1400+ tech skills courses.