Summary: Syle Transfer with GANs

Get a quick recap of the major learning points in this chapter.

In this chapter, we explored the creative side of GAN research through the lenses of image-to-image translation tasks. While the creative implications are obvious, such techniques also open up avenues to improve the research and development of computer vision models for domains where datasets are hard to get.

We started off the chapter by understanding the paired image-to-image translation task. This task provides training data where the source and destination domains have paired training samples. We explored this task using the pix2pix GAN architecture. Through this architecture, we explored how the encoder-decoder architecture is useful for developing generators that can produce high-fidelity outputs. The pix2pix took the encoder-decoder architecture one step further by making use of skip connections or a U-Net-style generator.

This setup also presented another powerful concept, called the Patch-GAN discriminator, which works elegantly to assist the overall GAN with better feedback signals for different style transfer use cases. We used these concepts to build and train our own pix2pix GAN from scratch to transfigure satellite images to Google Maps-like outputs. Our training results were good-quality outputs using very few training samples and training iterations. This faster and more stable training was observed to be a direct implication of different innovations contributed by the authors of this work. We also explored various other use cases that can be enabled using pix2pix style architectures.

In the second part of the chapter, we extended the task of image-to-image translation to work in the unpaired setting. The unpaired training setup is no doubt a more complex problem to solve, yet it opens up a lot more avenues. The paired setup is good for cases where we have explicit pairs of samples in both source and target domains, but most real-life scenarios do not have such datasets.

We explored the unpaired image-to-image translation setup through CycleGAN architecture. The authors of CycleGAN presented a number of intuitive yet powerful contributions that enable the unpaired setup to work. We discussed the concepts of cycle-consistency loss and identity loss as regularization terms for the overall adversarial loss. We specifically discussed how identity loss helps improve the overall reconstruction of samples and, thus, the overall quality of output. Using these concepts, we built the CycleGAN setup from scratch using TensorFlow-Keras APIs. We experimented with two datasets, apples-to-oranges, and photographs-to-Van Gogh-style paintings. The results were exceptionally good in both cases with unpaired samples.

Get hands-on with 1400+ tech skills courses.