The design of the GAN model in this section is based on the text-to-image modelReed, Scott, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. "Generative adversarial text to image synthesis." In International conference on machine learning, pp. 1060-1069. PMLR, 2016.. Here, we will describe and define the architectures of the generator and discriminator networks and the training process.
Generator architecture
The generator network has two inputs, including a latent noise vector, z, and the embedding vector, t, of the description sentence. The embedding vector, t, has a length of 1,024, which is mapped by a fully-connected layer to a vector of 128. This vector is concatenated with the noise vector, z, to form a tensor with a size of [B ...