...
/Using Pixel-Wise Labels to Translate Images with pix2pix
Using Pixel-Wise Labels to Translate Images with pix2pix
Explore how to use pixel-wise labels to translate images with the pix2pix model.
We'll cover the following...
Labels can be assigned to specific pixels, which are known as pixel-wise labels. Pixel-wise labels are playing an increasingly important role in the realm of deep learning. For example, one of the most famous online image classification contests, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), is no longer being hosted since its last event in 2017, whereas object detection and segmentation challenges such as COCO are receiving more attention.
Semantic segmentation
An iconic application of pixel-wise labeling is semantic segmentation. Semantic segmentation (or image/object segmentation) is a task in which every pixel in the image must belong to one object. The most promising application of semantic segmentation is autonomous cars (or self-driving cars). If each and every pixel that’s captured by the camera that’s mounted on the self-driving car is correctly classified, all of the objects in the image will be easily recognized, which makes it much easier for the vehicle to properly analyze the current environment and make the right decision upon whether it should, for example, turn or slow down to avoid other vehicles and pedestrians.
Transforming the original color image into a segmentation map (as shown in the following diagram) can be considered an image-to-image translation problem, which is a much larger field and includes style transfer, image colorization, and more. Image style transfer is about moving the iconic textures and colors from one image to another, such as combining a photo with a Vincent van Gogh painting to create a unique artistic portrait. Image colorization is a task where we feed a 1-channel grayscale image to the model and let it predict the color information for each pixel, which leads to a 3-channel color image.
GANs can be used in image-to-image translation as well. In this section, we will use a classic image-to-image translation model, pix2pix, to transform images from one domain to another.
The authors of the paper have kindly provided the
Generator architecture
The architecture of the generator network of pix2pix is as follows:
Here, we assume that both the input and output data are 3-channel
We can see that the first half layers of this network gradually transform the input image into