Convolution in Practice
Find out why convolutional and pooling layers are the building blocks of Convolutional Neural Networks.
We'll cover the following
When it comes to real-life applications, most images are in fact a 3D tensor with width, height, and 3 channels (R,G,B) as dimensions.
In that case, the kernel should also be a 3D tensor (). Each kernel will produce a 2D feature map. Remember the sliding happens only across width and height. We just take the dot product of all the input channels on the computation. Each kernel will produce 1 output channel.
In practice, we tend to use more than 1 kernel in order to capture different kinds of features at the same time.
As you may have guessed, our learnable weights are now the values of our filters and can be trained with backpropagation, as usual. We can add a bias into each filter as well.
Convolutional layers can be stacked on top of others. Since convolutions are linear operators, we include non-linear activation functions in between just as we did in fully connected layers.
To recap, you have to think in terms of input channels, output channels, and kernel size. And that is exactly how we are going to define it in Pytorch.
To define a convolutional network in Pytorch, we have:
Get hands-on with 1300+ tech skills courses.