We don’t apply any filter on the input matrix in a max-pooling layer in a convolutional neural network. Rather, we select the feature with the maximum value in a kernel of a given size and pass it to the output of that layer.
A forward pass through a max-pooling layer is fairly simple to process. We move the kernel along the input matrix and pass the maximum valued feature in that kernel to the output. The following animation performs a forward pass on a input matrix through a max-pooling layer with a kernel of size and a stride of 2.
Since the max-pooling layer doesn’t have any weights, we need to find the gradient of the error with respect to the input matrix only, that is, for backpropagation.
Therefore, we need to find the following matrix:
Let's look at one of these matrix elements.
This equation is obtained by the chain rule of differentiation. Since effects only in our particular example, therefore the equation becomes:
Where we have defined .
Since , is nonzero only if is the maximum valued feature in the kernel, let’s assume that , and then
If , then and
Hence, the gradient of the error with respect to the input features is nonzero only if the input feature has the maximum value in the pooling kernel.
Using this strategy, we can compute the full backward pass as follows:
Free Resources