## Overview

We don't apply any filter on the input matrix in a **max-pooling layer** in a [convolutional neural network](https://www.educative.io/answers/what-is-the-convolutional-neural-network). 
Rather, we select the feature with the maximum value in a kernel of a given size and pass it to the output of that layer.

A **forward pass** through a max-pooling layer is fairly simple to process. We move the kernel along the input matrix and pass the maximum valued feature in that kernel to the output. The following animation performs a forward pass on a $4 \times 4$ input matrix through a max-pooling layer with a kernel of size $2 \times 2$ and a stride of 2. 

## Backward pass
Since the max-pooling layer doesn't have any weights, we need to find the gradient of the error with respect to the input matrix only, that is, $\frac{\partial E}{\partial X}$ for backpropagation. 

Therefore, we need to find the following matrix:



\begin{bmatrix}
\frac{\partial E}{\partial x_{11}} & \frac{\partial E}{\partial x_{12}} & \frac{\partial E}{\partial x_{13}} & \frac{\partial E}{\partial x_{14}}\\ \\
\frac{\partial E}{\partial x_{21}} & \frac{\partial E}{\partial x_{22}} & \frac{\partial E}{\partial x_{23}} &\frac{\partial E}{\partial x_{24}}\\ \\ \frac{\partial E}{\partial x_{31}} & \frac{\partial E}{\partial x_{32}} & \frac{\partial E}{\partial x_{33}} &\frac{\partial E}{\partial x_{34}} \\  \\\frac{\partial E}{\partial x_{41}} & \frac{\partial E}{\partial x_{42}} & \frac{\partial E}{\partial x_{43}} &\frac{\partial E}{\partial x_{44}} \\
\end{bmatrix}

\frac{\partial E}{\partial x_{11}} = \frac{\partial E}{\partial y_{11}} \cdot \frac{\partial y_{11}}{\partial x_{11}} + \frac{\partial E}{\partial y_{12}} \cdot \frac{\partial y_{12}}{\partial x_{11}} + \frac{\partial E}{\partial y_{21}} \cdot \frac{\partial y_{21}}{\partial x_{11}} + \frac{\partial E}{\partial y_{22}} \cdot \frac{\partial y_{22}}{\partial x_{11}}

This equation is obtained by the **chain rule** of differentiation. Since $x_{11}$ effects only $y_{11}$ in our particular example, therefore the equation becomes:

\frac{\partial E}{\partial x_{11}} = \frac{\partial E}{\partial y_{11}} \cdot \frac{\partial y_{11}}{\partial x_{11}} = D_{11} \cdot \frac{\partial y_{11}}{\partial x_{11}}

Where we have defined $D_{11} = \frac{\partial E}{\partial y_{11}}$.

Since $y_{11} = \text{max}(x_{11},x_{12},x_{21},x_{22})$, $\frac{\partial y_{11}}{\partial x_{11}}$ is nonzero only if $x_{11}$ is the maximum valued feature in the kernel, let's assume that $x_{11} \neq \text{max}(x_{11}, x_{12}, x_{21}, x_{22})$, and then 

If $x_{12} = \text{max}(x_{11}, x_{12}, x_{21}, x_{22})$, then $\frac{\partial y_{11}}{\partial x_{12}} = 1$ and


\frac{\partial E}{\partial x_{12}} = D_{12}

Hence, the gradient of the error with respect to the input features is nonzero only if the input feature has the maximum value in the pooling kernel.

Using this strategy, we can compute the full backward pass as follows:

How to backpropagate through max-pooling layers

Overview We don t apply any filter on the input matrix in a max-pooling layer in a convolutional neural network. Rather, we select the feature with the maximum value in a kernel of a given size and pass it to the output of that layer. A forward pass through a max-pooling layer is fairly simple to process. We move the kernel along the input matrix and pass the maximum valued feature in that kernel to the output. The following animation performs a forward pass on a 4×44 \times 44×4 input matrix through a max-pooling layer with a kernel of size 2×22 \times 22×2 and a stride of 2. 

How to backpropagate through max-pooling layers

Overview

Backward pass

Calculation