CNN Building Blocks

Get to know the most commonly used CNN building blocks.

Convolution layer

A neural network wouldn’t be named convolutional if it didn’t have at least one convolution layer. A two-dimensional convolution layer scans the input tensor with a small two-dimensional tensor (the convolution kernel) that has the same number of channels as the input tensor. For each pixel in the input tensor, a scalar product gets computed between the neighborhood around the central pixel and the convolution kernel. The result is a tensor that has a high activation where the input tensor looks like the convolution kernel.

To illustrate the functioning of a convolution layer, let’s design one by hand.

Consider the following image:

The bottom of a Raspberry Pi board
Let’s assume we need to highlight the areas where a green surface touches a yellow surface through a vertical boundary. To do that, we’ll convolve the image with the following kernel:

The 5x5 convolution kernel
We can manually design the convolution kernel by accessing the weight and bias fields of a torch.nn.Conv2d object:

import torch
import cv2
weight = torch.zeros(1, 3, 5, 5) # (N_kernels, N_channels, W, H)
bias = torch.zeros(1)
weight[0, 0, :, 0: 2] = 90 # The green band on the left
weight[0, 1, :, 0: 2] = 120
weight[0, 2, :, 0: 2] = 35
weight[0, 0, :, 2:] = 75 # The yellow band on the right
weight[0, 1, :, 2:] = 160
weight[0, 2, :, 2:] = 160
cv2.imwrite("./output/0_kernel_weight.png", cv2.resize(torch.moveaxis(weight.squeeze(0), 0, 2).int().numpy(), dsize=(300, 300), interpolation=cv2.INTER_NEAREST))
# Standardize the weight
weight = (weight - torch.mean(weight))/torch.std(weight)
bias[0] = 0.

In lines 6–11, we manually set the BGR values of the weight tensor. In line 16, the bias value is set to 0.

We can now perform a convolution ...