Appendix C: Convolution Examples
Revise the concept of convolution and different convolution filters.
We'll cover the following...
Convolution is common in neural networks that work with images, either as classifiers or as generators. When designing such convolutional neural networks, the shape of data emerging from each convolution layer needs to be worked out.
In this appendix, we’ll see how this can be done step-by-step with configurations of convolution that we’re likely to see working with images.
In particular, transposed convolutions are seen as difficult to grasp. Here we’ll show that they’re not difficult at all by working through some examples which all follow a very simple recipe.
Example 1: Convolution with stride 1, no padding
In this first simple example, we apply a 2 by 2 kernel to an input of size 6 by 6, with stride 1.
The picture shows how the kernel moves along the image in steps of size 1. The areas covered by the kernel do overlap but this is not a problem. Across the top of the image, the kernel can take 5 positions, which is why the output is 5 wide. Down the image, the kernel can also take 5 positions, which is why the output is a 5 by 5 square. Easy!
The PyTorch function for this convolution is:
nn.Conv2d(in_channels, out_channels, kernel_size=2, stride=1)
Example 2: Convolution with stride 2, no padding
This second example is the same as the previous one, but we now have a stride of 2.
We can see the kernel moves along the image in steps of size 2. This time the areas covered by the kernel don’t overlap. In fact, because the kernel size is the same as the stride, the image is covered without overlaps or gaps. The kernel can take 3 positions across and down the image, so the output is 3 by 3.
The PyTorch function for this convolution is:
nn.Conv2d(in_channels, out_channels, kernel_size=2, stride=2)
Example 3: Convolution with stride 2, with padding
This third example is the same as the previous one, but this time we use a padding of 1.
By setting padding to 1, we extend all the image edges by 1 pixel, with values set to 0. That means the image width has grown by 2. We apply the kernel to this extended image. The picture shows the kernel can take 4 positions across the image. This is why the output is 4 by 4.
The PyTorch function for this convolution is:
nn.Conv2d(in_channels, out_channels, kernel_size=2, stride=2, padding=1)
Example 4: Convolution with coverage gaps
This example illustrates the case where the chosen kernel size and stride mean it doesn’t reach the end of the image.
Here, the 2 by 2 kernel moves with a step size of 2 over the 5 by 5image. The last column of the image is not covered by the kernel.
The easiest thing to do is to just ignore the uncovered column, and this is in fact the approach taken by many implementations, including PyTorch. That’s why the output is 2 by 2.
For medium to large images, the loss of information from the very edge of the image is rarely a problem as the meaningful content is usually in the middle of the image. Even if it wasn’t, the fraction of the information lost is very small.
If we really wanted to avoid any information being lost, we’d adjust some of the options. We could add padding to ensure no part of the input image was missed, or we could adjust the kernel and stride sizes so they match the image size.
Example 5: Transpose convolution with stride 2, no padding
The transpose convolution is commonly used to expand a tensor to a larger tensor. This is the opposite of a normal convolution which is used to reduce a tensor to a smaller tensor.
In this example, we use a 2 by 2 kernel again, set to stride 2, applied to a 3 by 3 input.
The process for transposed convolution has a few extra steps but is not complicated.
First, we create an intermediate grid which has the original input’s cells spaced apart with a step size set to the stride. In the picture above, we can see the pink cells spaced apart with a step size of 2. The new in-between cells have value 0.
Next, we extend the edges of the intermediate image with additional cells with value 0. We add the maximum amount of these so ...