...
/Optimizing Feature Map Pooling
Optimizing Feature Map Pooling
Uncover the effectiveness of tailored maximum likelihood estimators for pooling in convolutional networks.
We'll cover the following...
A feature map follows a distribution. The distribution differs with samples. For example, an object with sharp edges at the center of an image will have a different feature map distribution compared to an object with smudgy edges or located at a corner.
The distribution’s maximum likelihood estimator (MLE) makes the most efficient pooling statistic. Here are a few distributions that feature maps typically follow and their MLEs.
Uniform distribution
A uniform distribution belongs to the symmetric location probability distribution family. A uniform distribution describes a process where the random variable has an arbitrary outcome in a boundary denoted as with the same probability. Its pdf is
Different shapes of the uniform distribution are shown in the following illustration as examples. Feature maps can follow a uniform distribution under some circumstances, such as if the object of interest is scattered in an image.
However, uniform distribution’s relevance lies in it being the maximum entropy probability distribution for a random variable. This implies that if nothing is known about the distribution except that the feature map is within a certain boundary (unknown limits) and belongs to a certain class, then the uniform distribution is appropriate.
Besides, the maximum likelihood estimator of uniform distribution is,
Therefore, if the feature map is uniformly distributed or distribution is unknown, is the best pooling statistic. The latter claim also reaffirms the reasoning behind max-pool’s superiority.
Normal distribution
A normal distribution, also known as Gaussian, is a continuous distribution from the exponential location family. It is characterized by its mean and standard deviation parameters. Its pdf is defined following the examples shown in the illustration below.
The MLEs of the normal distribution are
A normal distribution supports , that is, and is symmetric. But most nonlinear activated feature map either distorts the symmetry or bounds it. For example, ReLU lower bounds the feature map at ...