Efficient Pooling Strategies
Explore how efficient summary statistics in pooling layers can optimize feature extraction in convolutional networks.
The strength of a convolutional network is its ability to simplify the feature extraction process. In this, pooling plays a critical role by removing the extraneous information. A pooling operation summarizes features into a summary statistic. It, therefore, relies on the statistic’s efficiency. Whether the statistic preserves the relevant information or loses it depends on its efficiency.
What’s an efficient summary statistic?
A summary statistic is a construct from principles of data reduction. It summarizes a set of observations to preserve the largest amount of information as succinctly as possible.
Therefore, an efficient summary statistic is one that concisely contains the most information about a sample, such as the sample mean or maximum. Other statistics like the sample skewness or sample size, do not contain as much relevant information and, therefore, are not efficient for pooling. This lesson lays out the theory of summary statistics to learn about efficient statistics for pooling.
“An experimenter might wish to summarize the information in a sample by determining a few key features of the sample values. This is usually done by computing (summary) statistics—functions of the sample.” (Casella and Berger 2002)
Learning the dependence of pooling on the efficiency of summary statistics and the theory behind them is rewarding. It provides answers to questions like:
-
Currently, max-pool and average-pool are the most common. Could there be other equally or more effective pooling statistics?
-
Max-pool is found to be robust and, therefore, better than others in most problems. What is the cause of max pooling’s robustness?
-
Can more than one pooling statistic be used together? If yes, how to find the best combination of statistics?
This lesson goes deeper into the theory of extracting meaningful features in the pooling layer. In doing so, the above questions are answered. Moreover, the theory behind summary statistics also provides an understanding of appropriately choosing a single or a set of statistics for pooling.
Note: Pooling operation computes a summary statistic, and its efficacy relies on the efficiency of the statistic.
In the following, summary statistics applicable to pooling are explained in three categories:
- Sufficient (minimal) statistics
- Complete statistics
- Ancillary statistics
Definitions
The feature map outputted by a convolutional layer is the input to a pooling layer. The feature map is a random variable
where is the feature map size.
An observation of the random variable is denoted as:
Describing properties of random variables is beyond the scope of this course, but it suffices to know that their true underlying distribution and parameters are unknown.
The distribution function, that is, the