How to backpropagate through max-pooling layers

Overview

We don’t apply any filter on the input matrix in a max-pooling layer in a convolutional neural network. Rather, we select the feature with the maximum value in a kernel of a given size and pass it to the output of that layer.

A forward pass through a max-pooling layer is fairly simple to process. We move the kernel along the input matrix and pass the maximum valued feature in that kernel to the output. The following animation performs a forward pass on a 4×44 \times 4 input matrix through a max-pooling layer with a kernel of size 2×22 \times 2 and a stride of 2.

1 of 4

Backward pass

Since the max-pooling layer doesn’t have any weights, we need to find the gradient of the error with respect to the input matrix only, that is, EX\frac{\partial E}{\partial X} for backpropagation.

Therefore, we need to find the following matrix:

Let's look at one of these matrix elements.

Calculation

This equation is obtained by the chain rule of differentiation. Since x11x_{11} effects only y11y_{11} in our particular example, therefore the equation becomes:

Where we have defined D11=Ey11D_{11} = \frac{\partial E}{\partial y_{11}}.

Since y11=max(x11,x12,x21,x22)y_{11} = \text{max}(x_{11},x_{12},x_{21},x_{22}), y11x11\frac{\partial y_{11}}{\partial x_{11}} is nonzero only if x11x_{11} is the maximum valued feature in the kernel, let’s assume that x11max(x11,x12,x21,x22)x_{11} \neq \text{max}(x_{11}, x_{12}, x_{21}, x_{22}), and then

If x12=max(x11,x12,x21,x22)x_{12} = \text{max}(x_{11}, x_{12}, x_{21}, x_{22}), then y11x12=1\frac{\partial y_{11}}{\partial x_{12}} = 1 and

Hence, the gradient of the error with respect to the input features is nonzero only if the input feature has the maximum value in the pooling kernel.

Using this strategy, we can compute the full backward pass as follows:

1 of 4

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved