Search⌘ K
AI Features

GAN Architecture and Training

Explore the architecture and training of text-to-image GAN models in PyTorch. Understand how the generator and discriminator networks use latent noise and text embeddings to produce images. Learn training procedures, including loss calculations and dataset usage, to effectively build and evaluate these GANs for image generation from descriptions.

The design of the GAN model in this section is based on the text-to-image modelReed, Scott, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. "Generative adversarial text to image synthesis." In International conference on machine learning, pp. 1060-1069. PMLR, 2016.. Here, we will describe and define the architectures of the generator and discriminator networks and the training process.

Generator architecture

The generator network has two inputs, including a latent noise vector, zz, and the embedding vector, tt, of the description sentence. The embedding vector, tt, has a length of 1,024, which is mapped by a fully-connected layer to a vector of 128. This vector is concatenated with the noise vector, zz, to form a tensor with a size of [B,228,1,1][B, 228, 1, 1] (in which ...