...
/Using Pixel-Wise Labels to Translate Images with pix2pix
Using Pixel-Wise Labels to Translate Images with pix2pix
Explore how to use pixel-wise labels to translate images with the pix2pix model.
We'll cover the following...
Labels can be assigned to specific pixels, which are known as pixel-wise labels. Pixel-wise labels are playing an increasingly important role in the realm of deep learning. For example, one of the most famous online image classification contests, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), is no longer being hosted since its last event in 2017, whereas object detection and segmentation challenges such as COCO are receiving more attention.
Semantic segmentation
An iconic application of pixel-wise labeling is semantic segmentation. Semantic segmentation (or image/object segmentation) is a task in which every pixel in the image must belong to one object. The most promising application of semantic segmentation is autonomous cars (or self-driving cars). If each and every pixel that’s captured by the camera that’s mounted on the self-driving car is correctly classified, all of the objects in the image will be easily recognized, which makes it much easier for the vehicle to properly analyze the current environment and make the right decision upon whether it should, for example, turn or slow down to avoid other vehicles and pedestrians.
Transforming the original color image into a segmentation map (as shown in the following diagram) can be considered an image-to-image translation problem, which is a much larger field and includes style transfer, image colorization, and more. Image style transfer is about moving the iconic textures and colors from one image to another, such as combining a photo with a Vincent van Gogh painting to create a unique artistic portrait. Image colorization is a task where we feed a 1-channel grayscale image to the model and let it predict the color information for each pixel, which leads to a 3-channel color image.
GANs can be used in image-to-image translation as well. In this section, we will use a classic image-to-image translation model, pix2pix, to transform images from one domain to another.
The authors of the paper have kindly provided the
Generator architecture
The architecture of the generator network of pix2pix is as follows:
Here, we assume that both the input and output data are 3-channel
We can see that the first half layers of this network gradually transform the input image into
The pix2pix model is defined in the models.pix2pix_model.Pix2PixModel
class, which is derived from an abstract base class (ABC) known as models.base_model.BaseModel
.
Note: An abstract base class in Python is a class containing at least one abstract method (that's declared and not implemented). It cannot be instantiated. We can only create objects with its subclasses after providing the implementations for all the abstract methods.
A generator network, netG
, is created by the models.network
.define_G
method. By default, it takes unet_256
as the netG
argument value.
Therefore, models.networks.UnetGenerator
is used to create the U-Net. In order to show how the U-net is created in recursive manner, we replace the arguments with their actual values, as shown in the following code:
import torch.nn as nnclass UnetGenerator(nn.Module):def __init__(self):super(UnetGenerator, self).__init__()unet_block = UnetSkipConnectionBlock(64 * 8, 64 * 8, submodule=None, innermost=True)for i in range(8 - 5):unet_block = UnetSkipConnectionBlock(64 * 8, 64 * 8, submodule=unet_block, use_dropout=True)unet_block = UnetSkipConnectionBlock(64 * 4, 64 * 8, submodule=unet_block)unet_block = UnetSkipConnectionBlock(64 * 2, 64 * 4, submodule=unet_block)unet_block = UnetSkipConnectionBlock(64, 64 * 2, submodule=unet_block)self.model = UnetSkipConnectionBlock(3, 64, input_nc=3, submodule=unet_block, outermost=True)def forward(self, input):return self.model(input)
At the fourth line in the preceding code snippet, the innermost block is defined, which creates the layers in the middle of the U-Net. The innermost block is defined as follows. Note that the following code should be treated as pseudocode since it just shows how different blocks are designed.
class UnetSkipConnectionBlock(nn.Module):def __init__(self):down = [nn.LeakyReLU(0.2, inplace=True),nn.Conv2d(64 * 8, 64 * 8, kernel_size=4, stride=2, padding=1, bias=False)]up = [nn.ReLU(inplace=True),nn.ConvTranspose2d(64 * 8, 64 * 8, kernel_size=4, stride=2, padding=1, bias=False),nn.BatchNorm2d(64 * 8)]model = down + upself.model = nn.Sequential(*model)def forward(self, x):return torch.cat([x, self.model(x)], 1)
The nn.Conv2d
layer in down
transforms