GAN Architecture and Training

Understand the GAN architecture of the text-to-image model and follow the step-by-step model training process.

The design of the GAN model in this section is based on the text-to-image modelReed, Scott, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. "Generative adversarial text to image synthesis." In International conference on machine learning, pp. 1060-1069. PMLR, 2016.. Here, we will describe and define the architectures of the generator and discriminator networks and the training process.

Generator architecture

The generator network has two inputs, including a latent noise vector, zz, and the embedding vector, tt, of the description sentence. The embedding vector, tt, has a length of 1,024, which is mapped by a fully-connected layer to a vector of 128. This vector is concatenated with the noise vector, zz, to form a tensor with a size of [B,228,1,1][B, 228, 1, 1] ...