Max pooling is a layer that is used in convolutional neural networks (CNNs), which are neural network models used for image classification or computer vision tasks. This layer is found after the convolution layer and is used to reduce the spatial dimensions of the feature map's that are outputted by the convolution layer while retaining the most important features.
As shown in the diagram below, the max pooling layer takes in the feature map from the convolution layer. It applies the max pooling operation on the feature map, generating a reduced feature map sent to the rest of the convolutional neural network.
Now that we have understood why the max pooling layer is used, let's look at the two parameters that we must define for our max pooling layer.
As we see in the diagram above, the max pooling layer typically comprises a window which is essentially a two-dimensional array that iterates over the feature map and applies the pooling operation. When we define the max pooling layer, we need to define the window size, which can vary upon our implementation, for example, a window size of 2x2, as shown in the diagram above.
Another thing that we must define is the value of the stride which is the number of jumps a feature map must make per max pool operation. A large stride value means the window will jump more pixels per max pooling operation and vice versa. For example, if we define a stride of 2 pixels, the pooling window will move two units at a time after each operation.
Now that we have looked at the max pooling layer, we will look at how it sub-samples the feature map reducing its spatial dimensions with the help of a slideshow example.
In the example, we define a max pooling layer with a window size of 2x2 and a stride value of 2. We see that the window slides over the feature map and takes the maximum value in the area that is inside the window matrix so in the first operation, the window contains the values 9, 12, 18, and 7, so it outputs the maximum value which is 18 and then moves on to the rest of the pixels in the feature map.
After each operation, the window will also jump 2 pixels, as we defined the stride value as 2. This will continue after the window slides over the whole feature map and retrieves the maximum value in each window area to form a reduced feature map which is finally outputted by the max-pooling layer.
Note: We explained the forward propagation steps of the max pooling layer, To learn about the back propagation in a max pooling layer, visit this answer.
Let's look at some benefits of using a max pooling layer in our CNN architecture.
Reduced computation: The reduced feature map will help in controlling the computational complexity of the convolution neural network, making it perform faster and more efficiently.
Feature selection: Since we select the max value in a specified region, we ensure that the most salient feature in that region is preserved as the features propagate forward through the network.
Improved generalization: With the help of max pooling, we can reduce noise in the input data and focus on more important features as we only focus on the maximum values in a feature map.
Reduced overfitting: The max pooling layer can also help reduce overfitting in a neural network model, as we saw that it discards any noise in the input feature map.
The max pooling layer is a crucial component of a convolution neural network architecture, and it helps the neural network extract important features from the input while simultaneously reducing the dimensions of the data. With the addition of max pooling layers, the neural network is made more resilient to fluctuations in the input data and helps in reducing overfitting, which is a common issue that affects machine learning models.
Free Resources