Understanding CNNs: Convolution Operations

Learn about the different kinds of convolution operations.

Now that we understand the high-level concepts governing CNNs, let’s walk through the technical details of a CNN. First, we’ll discuss the convolution operation and introduce some terminology, such as filter size, stride, and padding. In brief, filter size refers to the window size of the convolution operation, stride refers to the distance between two movements of the convolution window, and padding refers to the way we handle the boundaries of the input. We’ll also discuss an operation that’s known as deconvolution or transposed convolution. Then, we’ll discuss the details of the pooling operation. Finally, we’ll discuss how to add fully connected layers, which produce the classification or regression output.

Convolution operation

In this section, we’ll discuss the convolution operation in detail. First, we’ll discuss the convolution operation without stride and padding, then we’ll describe the convolution operation with stride, and then we’ll discuss the convolution operation with padding. Finally, we’ll discuss something called transposed convolution. For all the operations, we consider the index starting from one and not from zero.

Standard convolution operation

The convolution operation is a central part of CNNs. For an input of size n×nn \times n and a weight patch (also known as a filter or a kernel) of m×mm \times m, where nmn \geq m, the convolution operation slides the patch of weights over the input. Let’s denote the input by XX, the patch of weights by WW, and the output by HH. Also, at each location i, j, the output is calculated as follows:

Get hands-on with 1400+ tech skills courses.