Optimizing Feature Map Pooling

Uncover the effectiveness of tailored maximum likelihood estimators for pooling in convolutional networks.

A feature map follows a distribution. The distribution differs with samples. For example, an object with sharp edges at the center of an image will have a different feature map distribution compared to an object with smudgy edges or located at a corner.

The distribution’s maximum likelihood estimator (MLE) makes the most efficient pooling statistic. Here are a few distributions that feature maps typically follow and their MLEs.

Uniform distribution

A uniform distribution belongs to the symmetric location probability distribution family. A uniform distribution describes a process where the random variable has an arbitrary outcome in a boundary denoted as (α,β)(α,β) with the same probability. Its pdf is

f(x)={1(βα),if α<x<β0,otherwisef(x)=\begin{cases} \frac{1}{(β-α)}&, &\text{if}\spaceα < x < β\\ 0&, &\text{otherwise} \end{cases}

Different shapes of the uniform distribution are shown in the following illustration as examples. Feature maps can follow a uniform distribution under some circumstances, such as if the object of interest is scattered in an image.

Press + to interact
Shapes of the uniform distribution
Shapes of the uniform distribution

However, uniform distribution’s relevance lies in it being the maximum entropy probability distribution for a random variable. This implies that if nothing is known about the distribution except that the feature map is within a certain boundary (unknown limits) and belongs to a certain class, then the uniform distribution is appropriate.

Besides, the maximum likelihood estimator of uniform distribution is,

β^=maxiXi\hat{β} = \max_iX_i

Therefore, if the feature map is uniformly distributed or distribution is unknown, maxiXi\text{max}_iX_i is the best pooling statistic. The latter claim also reaffirms the reasoning behind max-pool’s superiority.

Normal distribution

A normal distribution, also known as Gaussian, is a continuous distribution from the exponential location family. It is characterized by its mean μμ and standard deviation σσ parameters. Its pdf is defined following the examples shown in the illustration below.

Press + to interact
Examples of the normal distribution
Examples of the normal distribution

f(x)=12πσ2exp((xμ)22σ2).f (x) = \frac{1}{\sqrt{2πσ^2}}\exp\bigg(-\frac{(x − μ)^2}{2σ^2}\bigg).

The MLEs of the normal distribution are

μ^=iXinσ^2=i(XiXˉ)2n1.\begin{align*} \hat{μ}=& \left. \frac{\sum_iX_i}{n}\right. \\ \hat{σ}^2=& \left. \frac{\sum_i(X_i-\bar{X})^2}{n-1}. \right. \end{align*}

A normal distribution supports <x<−∞ < x < ∞, that is, xRx ∈ \mathbb{R} and is symmetric. But most nonlinear activated feature map either distorts the symmetry or bounds it. For example, ReLU lower bounds the feature map at 0 ...