Generative Image Inpainting

Explore the concept of generative image inpainting.

We know that GANs, if trained properly, are capable of learning the latent distribution of data and using that information to create new samples. This extraordinary ability of GANs makes them perfect for applications such as image inpainting, which is filling the missing part in images with plausible pixels.

Press + to interact
Generative image inpainting
Generative image inpainting

In this section, we will learn how to train a GAN model to perform image inpainting based on generative image inpainting paperYu, Jiahui, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. "Generative image inpainting with contextual attention." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5505-5514. 2018., although an updated version of the paperhttps://github.com/JiahuiYu/generative_inpainting has also been published. Before we starting working on addressing image inpainting with GANs, there are a few fundamental concepts to understand as they are crucial to comprehend the method.

Efficient convolution from im2col to nn.Unfold

If you have previously been curious enough to try implementing convolutional neural networks on your own (either with Python or C/C++), you must know the most painful part of work is the backpropagation of gradients, and the most time-consuming part is the convolutions (assuming that it is a plain CNN implementation such as LeNet).

There are several ways to perform the convolution in our code (apart from directly using deep learning tools such as PyTorch):

  1. Calculate the convolution directly as per definition, which is usually the slowest way.

  2. Use Fast Fourier Transform (FFT)An algorithm that computes the discrete Fourier transform of a sequence, or its inverse., which is not ideal for CNNs since the sizes of kernels are often way too small compared to the images.

  3. Treat the convolution as matrix multiplication (in other words, General Matrix Multiply or GeMM) using im2col. This is the most common method used by numerous software and tools and is a lot faster.

  4. Use the Winograd methodIn Winograd convolution, the input and kernel are sampled at a given set of points using transform matrices., which is faster than GeMM under certain circumstances.

In this section, we will only talk about the first three methods. To learn more about the Winograd method, feel free to check out this projecthttps://github.com/andravin/wincnn and this paperLavin, Andrew, and Scott Gray. "Fast algorithms for convolutional neural networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4013-4021. 2016..

Now that we have understood different types of convolution operations, take the opportunity to test your knowledge by interacting with our AI widget below. You can begin the conversation with a simple “Hello.”

Powered by AI
10 Prompts Remaining
Prompt AI WidgetOur tool is designed to help you to understand concepts and ask any follow up questions. Ask a question to get started.

Python code for 2D convolution

Here, we will use Python code for 2D convolution with different methods. Let's follow these steps:

  1. Directly calculate the convolution. Note that all of the following convolution implementations have a stride size of 11 and a padding size of 00, which means that the output size is image_sizekernel_size+1image\_size - kernel\_size + 1:

Press + to interact
import numpy as np
def conv2d_direct(x, w):
w = np.flip(np.flip(w, 0), 1)
rows = x.shape[0]
cols = x.shape[1]
kh = w.shape[0]
kw = w.shape[1]
rst = np.zeros((rows-kh+1, cols-kw+1))
for i in range(rst.shape[0]):
for j in range(rst.shape[1]):
tmp = 0.
for ki in range(kh):
for kj in range(kw):
tmp += x[i+ki][j+kj] * w[ki][kj]
rst[i][j] = tmp
return rst

As we said before, directly calculating the convolution as per definition is extremely slow. Here is the elapsed time when convolving a 512×512512 \times 512 image with a ...