What is 2D Convolution?

Learn the logic behind 2D convolution.

2D convolution

The 2D convolution operation applies a 2D window to the input 2D matrix, where the window slides across the input matrix to perform element-wise multiplication and summing operations.

The following example shows how to travel a 5x5 2D matrix with a 3x3 window:

Press + to interact
First iteration with convolution window
1 / 6
First iteration with convolution window

Correlation vs. convolution

2D correlation travels the image with the window and applies multiplication operations like convolution. The only difference is that in 2D convolution, we actually flip our kernel horizontally and vertically to obtain our final convolution window before traveling the image. On the other hand, in correlation, we use the initial kernel as it is.

Press + to interact
2D correlation vs convolution
2D correlation vs convolution

In literature, these two terms are often used interchangeably, mostly because we don’t see the flips when we use built-in functions, and the rest are exactly the same. Nevertheless, it’s important to know the main difference.

When to use convolution?

Depending on the kernel type and the variables inside, the effect of the convolution can vary. We can use convolution to blur or sharpen an image and detect the edges. If our goal is one of them, we usually determine the kernel variables specifically according to our aim. However, in convolutional layers, we use convolution operations to extract local features, and we don’t determine the kernel variables manually. The kernel variables are determined during training.

Now let’s look at how we calculate the output value while traveling with our convolution window.

Press + to interact
Calculation of first pixel in output matrix
1 / 2
Calculation of first pixel in output matrix

The main idea is to multiply overlapping elements of the input matrix and convolution window, then take the sum of these multiplications to obtain the output element. While each iteration gives one output value, the input matrix size gets smaller.

Zero padding

What if we want to apply convolution but obtain an output matrix the same size as our input matrix? We use padding.

Padding is adding additional pixels on our input matrix borders to expand the image. The most common padding technique is to add zeros into these new columns and rows, which is called zero padding.

Padding size (m,n)(m,n) ...