Understanding CNNs: Pooling Operations

Learn about CNN pooling operations.

Pooling operation

The pooling operation, which is sometimes known as the subsampling operation, was introduced to CNNs mainly for reducing the size of the intermediate outputs and for making the CNN invariant to small translations in the input. This is preferred over the natural dimensionality reduction caused by convolution without padding because we can decide where to reduce the size of the output with the pooling layer, in contrast to forcing it to happen every time. Forcing the dimensionality to decrease without padding would strictly limit the number of layers we can have in our CNN models.

We define the pooling operation mathematically in the following sections. More precisely, we’ll discuss two types of pooling: max pooling and average pooling. First, however, we’ll define the notation. For an input of size n×nn \times n and a kernel (analogous to the filter of a convolution layer) of size m×mm \times m, where nmn \geq m, the convolution operation slides the patch of weights over the input. Let’s denote the input by XX, the patch of weights by WW, and the output by HH. Then let’s use xi,jx_{i,j}, wi,jw_{i,j}, and hi,jh_{i,j} to denote the value at the (i,j)th(i,j)^{th} location of XX, WW, and HH, respectively. We’ll now look at specific implementations of pooling commonly used.

Max pooling

The max pooling operation picks the maximum element within the defined kernel of an input to produce the output. The max pooling operation shifts windows over the input (the middle squares in the figure below) and take the maximum at each time. Mathematically, we define the pooling equation as follows:

Get hands-on with 1400+ tech skills courses.