Understanding Deep Learning Applications in Rare Event Prediction/

...

Optimizing Feature Map Pooling

Uncover the effectiveness of tailored maximum likelihood estimators for pooling in convolutional networks.

We'll cover the following...

Uniform distribution
Normal distribution
Gamma distribution
Weibull distribution

A feature map follows a distribution. The distribution differs with samples. For example, an object with sharp edges at the center of an image will have a different feature map distribution compared to an object with smudgy edges or located at a corner.

The distribution’s maximum likelihood estimator (MLE) makes the most efficient pooling statistic. Here are a few distributions that feature maps typically follow and their MLEs.

Uniform distribution

A uniform distribution belongs to the symmetric location probability distribution family. A uniform distribution describes a process where the random variable has an arbitrary outcome in a boundary denoted as $(α,β)$ with the same probability. Its pdf is

f(x)=\begin{cases} \frac{1}{(β-α)}&, &\text{if}\spaceα < x < β\\ 0&, &\text{otherwise} \end{cases}

Different shapes of the uniform distribution are shown in the following illustration as examples. Feature maps can follow a uniform distribution under some circumstances, such as if the object of interest is scattered in an image.

Press + to interact

However, uniform distribution’s relevance lies in it being the maximum entropy probability distribution for a random variable. This implies that if nothing is known about the distribution except that the feature map is within a certain boundary (unknown limits) and belongs to a certain class, then the uniform distribution is appropriate.

Besides, the maximum likelihood estimator of uniform distribution is,

\hat{β} = \max_iX_i

Therefore, if the feature map is uniformly distributed or distribution is unknown, $\text{max}_iX_i$ is the best pooling statistic. The latter claim also reaffirms the reasoning behind max-pool’s superiority.

Normal distribution

A normal distribution, also known as Gaussian, is a continuous distribution from the exponential location family. It is characterized by its mean $μ$ and standard deviation $σ$ parameters. Its pdf is defined following the examples shown in the illustration below.

Press + to interact

f (x) = \frac{1}{\sqrt{2πσ^2}}\exp\bigg(-\frac{(x − μ)^2}{2σ^2}\bigg).

The MLEs of the normal distribution are

\begin{align*} \hat{μ}=& \left. \frac{\sum_iX_i}{n}\right. \\ \hat{σ}^2=& \left. \frac{\sum_i(X_i-\bar{X})^2}{n-1}. \right. \end{align*}

A normal distribution supports $−∞ < x < ∞$ , that is, $x ∈ \mathbb{R}$ and is symmetric. But most nonlinear activated feature map either distorts the symmetry or bounds it. For example, ReLU lower bounds the feature map at $0$ .

Due to the distortion, activated feature maps are unlikely to follow a normal distribution. Therefore, a restructured convolution → pooling → activation architecture becomes favorable.

A normal distribution is a plausible distribution for most data samples except in some special cases, such as if the object in an image is at the corners. Normality offers two MLE statistics for pooling that provide signal and spread information.

Gamma distribution

A gamma distribution is an asymmetrical distribution also from the exponential family. It has two parameters: shape $k$ and scale $θ$ ...

Getting Started

Rare Event Prediction

Multi-Layer Perceptrons (MLPs)

Long Short-Term Memory (LSTM) Networks

Convolutional Neural Networks (CNNs)

Autoencoders

Conclusion

Optimizing Feature Map Pooling

Uniform distribution

Normal distribution

Gamma distribution