Semantic segmentation

An iconic application of pixel-wise labeling is semantic segmentation. Semantic segmentation (or image/object segmentation) is a task in which every pixel in the image must belong to one object. The most promising application of semantic segmentation is autonomous cars (or self-driving cars). If each and every pixel that’s captured by the camera that’s mounted on the self-driving car is correctly classified, all of the objects in the image will be easily recognized, which makes it much easier for the vehicle to properly analyze the current environment and make the right decision upon whether it should, for example, turn or slow down to avoid other vehicles and pedestrians.

Press + to interact

Transforming the original color image into a segmentation map (as shown in the following diagram) can be considered an image-to-image translation problem, which is a much larger field and includes style transfer, image colorization, and more. Image style transfer is about moving the iconic textures and colors from one image to another, such as combining a photo with a Vincent van Gogh painting to create a unique artistic portrait. Image colorization is a task where we feed a 1-channel grayscale image to the model and let it predict the color information for each pixel, which leads to a 3-channel color image.

GANs can be used in image-to-image translation as well. In this section, we will use a classic image-to-image translation model, pix2pix, to transform images from one domain to another. Pix2pixIsola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-image translation with conditional adversarial networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134. 2017. was designed to learn of the connections between paired collections of images, for example, transforming an aerial photo taken by a satellite into a regular map or a sketch image into a color image, and vice versa.

The authors of the paper have kindly provided the full source codehttps://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.git for their work, which runs perfectly on PyTorch. The source code is also well organized. Therefore, we will use the code directly in order to train and evaluate the pix2pix model and learn how to organize our models in a different way.

Generator architecture

The architecture of the generator network of pix2pix is as follows:

Press + to interact

Here, we assume that both the input and output data are 3-channel $256\times256$ images. In order to illustrate the generator structure of pix2pix, feature maps are represented by colored blocks and convolution operations are represented by gray and blue arrows, in which gray arrows are convolution layers for reducing the feature map sizes and blue arrows are for doubling the feature map sizes. Identity mapping (including skip connections) is represented by black arrows.

We can see that the first half layers of this network gradually transform the input image into $1\times1$ feature maps (with wider channels) and the last half layers transform these very small feature maps into an output image with the same size of the input image. It compresses the input data into much lower dimensions and changes them back to their original dimensions. Therefore, this U-shaped kind of network structure is often known as U-Net. There are also many skip connections in the U-Net that connect the mirrored layers in order to help information (including details coming from previous layers in the forward pass and gradients coming from the latter layers in the backward pass) flow through the network. Without these skip connections, the network is also known as an encoder-decoder model, meaning that we stack a decoder at the end of an encoder.

Press + to interact

The pix2pix model is defined in the models.pix2pix_model.Pix2PixModel class, which is derived from an abstract base class (ABC) known as models.base_model.BaseModel.

Note: An abstract base class in Python is a class containing at least one abstract method (that's declared and not implemented). It cannot be instantiated. We can only create objects with its subclasses after providing the implementations for all the abstract methods.

A generator network, netG, is created by the models.network.define_G method. By default, it takes unet_256 as the netG argument value.

Therefore, models.networks.UnetGenerator is used to create the U-Net. In order to show how the U-net is created in recursive manner, we replace the arguments with their actual values, as shown in the following code:

Press + to interact

import torch.nn as nn
class UnetGenerator(nn.Module):
    def __init__(self):
        super(UnetGenerator, self).__init__()
        unet_block = UnetSkipConnectionBlock(64 * 8, 64 * 8, submodule=None, innermost=True)
        for i in range(8 - 5):
        unet_block = UnetSkipConnectionBlock(64 * 8, 64 * 8, submodule=unet_block, use_dropout=True)
        unet_block = UnetSkipConnectionBlock(64 * 4, 64 * 8, submodule=unet_block)
        unet_block = UnetSkipConnectionBlock(64 * 2, 64 * 4, submodule=unet_block)
        unet_block = UnetSkipConnectionBlock(64, 64 * 2, submodule=unet_block)
        self.model = UnetSkipConnectionBlock(3, 64, input_nc=3, submodule=unet_block, outermost=True)
    def forward(self, input):
        return self.model(input)

Press + to interact

class UnetSkipConnectionBlock(nn.Module):
    def __init__(self):
        down = [nn.LeakyReLU(0.2, inplace=True),
                nn.Conv2d(64 * 8, 64 * 8, kernel_size=4, stride=2, padding=1, bias=False)]
        up = [nn.ReLU(inplace=True),
              nn.ConvTranspose2d(64 * 8, 64 * 8, kernel_size=4, stride=2, padding=1, bias=False),
              nn.BatchNorm2d(64 * 8)]
        model = down + up
        self.model = nn.Sequential(*model)
    def forward(self, x):
        return torch.cat([x, self.model(x)], 1)

Getting Started

Generative Adversarial Networks Fundamentals

Best Practices for Model Design and Training

Building Our First GAN with PyTorch

Generating Images Based on Label Information

Image-to-Image Translation and Its Applications

Image Restoration with GANs

Training GANs to Break Different Models

Image Generation from Description Text

Sequence Synthesis with GANs

Reconstructing 3D Models with GANs

Concluding Remarks

Appendix

Using Pixel-Wise Labels to Translate Images with pix2pix

Semantic segmentation

Generator architecture