Generative AI with Python and TensorFlow 2/

...

Paired Style Transfer Using pix2pix GAN

Learn about a variant of conditional GANs used in the context of style transfer.

We'll cover the following...

The U-Net generator
Implementation
The Patch-GAN discriminator
Loss
Training pix2pix
Use cases

Style transfer is an intriguing research area that pushes the boundaries of creativity and deep learning together. In their work, “Image-to-Image Translation with Conditional Adversarial NetworksIsola, P., Zhu, J-Y., Zhou, T., & Efros, A.A. (2017). Image-to-Image Translation with Conditional Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5967-5976. https://ieeexplore.ieee.org/document/8100115,” Isola and Zhu et al. present a conditional GAN network that is able to learn task-specific loss functions and work across datasets. As the name suggests, this GAN architecture takes a specific type of image as input and transforms it into a different domain.

It is called pair-wise style transfer, as the training set needs to have matching samples from both source and target domains. This generic approach is shown to effectively synthesize high-quality images from label maps and edge maps, and even colorize images. The authors highlight the importance of developing an architecture capable of understanding the dataset at hand and learning mapping functions without the need for hand-engineering (which has been the case typically).

The U-Net generator

CNNs are optimized for computer vision tasks, using them for the generator as well as discriminator architectures has a number of advantages. This work focuses on two related architectures for the generator setup. The two choices are the vanilla encoder-decoder architecture and the encoder-decoder architecture with skip connections. The architecture with skip connections has more in common with the U-Net modelRonneberger, O., Fisher, P., & Brox, T. (2015). U-net: Convolutional Networks for Biomedical Image Segmentation. MICCAI, 2015. https://arxiv.org/abs/1505.04597 than the encoder-decoder setup. Hence, the generator in the pix2pix GAN is termed a U-Net generator (see the figure below for reference).

A typical encoder (in the encoder-decoder setup) takes an input and passes it through a series of downsampling layers to generate a condensed vector form. This condensed vector is termed the bottleneck feature. The decoder part then upsamples the bottleneck features to generate the final output. This setup is extremely useful in a number of scenarios, such as language translation and image reconstruction. The bottleneck features condense the overall input into a lower-dimensional space.

Theoretically, the bottleneck features capture all the required information, but practically, this becomes difficult when the input space is large enough.

Additionally, for our task of image-to-image translation, there are a number of important features that need to be consistent between the input and output images. For example, if we are training our GAN to generate aerial photos out of outline maps, the information associated with roads, water bodies, and other low-level information needs to be preserved between inputs and outputs, as shown below.

Press + to interact

The U-Net architecture uses skip connections to shuttle important features between the input and output (see the figures above). In the case of the pix2pix GAN, skip connections are added between every i^th down-sampling layer and $(n-i)^{th}$ over-sampling layer, where $n$ is the total number of layers in the generator. The skip connection leads to the concatenation of all channels from the $i^{th}$ to $(n-i)^{th}$ layers, with the i^thlayers being appended to the $(n-i)^{th}$ layers:

Press + to interact

Introduction to the Course

An Introduction to Generative AI

Building Blocks of Deep Neural Networks

Teaching Networks to Generate Digits

Painting Pictures with Neural Networks Using VAEs

Recognize Handwritten Digits Using a Deep Neural Network

Image Generation with GANs

Dataset Augmentation with GANs

Style Transfer with GANs

Assessment: Introduction to Generative AI to Style Transfer

Deepfakes with GANs

The Rise of Methods for Text Generation

Exploring OpenAI API

NLP 2.0: Using Transformers to Generate Text

Composing Music with Generative Models

Generating New Music with Artificial Intelligence

Play Video Games with Generative AI: GAIL

Emerging Applications in Generative AI

Assessment: Deepfakes using GANs to Emerging Applications

Conclusion

Appendix

Paired Style Transfer Using pix2pix GAN

The U-Net generator