Unpaired Style Transfer Using CycleGAN

Paired style transfer is a powerful setup with a number of use cases: it provides the ability to perform cross-domain transfer given a pair of source and target domain datasets. The pix2pix setup also showcased the power of GANs to understand and learn the required loss functions without the need for manually specifying them.

While paired style transfer is a huge improvement over hand-crafted loss functions and previous works, it is limited by the availability of paired datasets. Paired style transfer requires the input and output images to be structurally the same even though the domains are different (aerial to map, labels to scene, and so on). In this lesson, we’ll focus on an improved style transfer architecture called CycleGAN.

CycleGAN improves upon paired style transfer architecture by relaxing the constraints on input and output images. It also explores the unpaired style transfer paradigm, where the model actually tries to learn the stylistic differences between source and target domains without explicitly pairing input and output images.

Zhu and Park et al. describe this unpaired style transfer as similar to our ability to imagine how Van Gogh or Monet would have painted a particular scene, without having actually seen a side-by-side example. Quoting from the paperZhu, J-Y., Park, T., Isola, P., & Efros, A.A. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv. https://arxiv. org/abs/1703.10593:

"Instead, we have knowledge of the set of Monet paintings and of the set of landscape photographs. We can reason about the stylistic differences between these two sets, and thereby imagine what a scene might look like if we were to translate it from one set into the other."

This provides a nice advantage and opens up additional use cases where an exact pairing of source and target domains is either not available, or we do not have enough training examples.

Overall setup for CycleGAN

In the case of paired style transfer, the training dataset consists of paired samples, denoted as xi ,yi{x_i , y_i}, where xix_i and yiy_i have correspondence between them. This is shown below for reference:

Get hands-on with 1400+ tech skills courses.