...

/

Maximizing Efficiency with Complete Statistics

Maximizing Efficiency with Complete Statistics

Delve into the role of complete and ancillary statistics in optimizing pooling layers for more effective convolutional network models.

Complete statistics

The many choices with minimal sufficient statistics sometimes confuse a selection. This section introduces complete statistics, which narrows the pooling statistic choice to only the maximum likelihood estimator (MLE) of the feature map distribution.

A complete statistic is a bridge between minimal sufficient statistics and MLEsMaximum likelihood estimators. MLEs derived from complete minimal statistics have the essential attributes of unbiasedness and minimum variance along with the minimality and completeness properties. MLEs, therefore, become the natural choice for pooling. This removes most of the ambiguity around pooling statistic selection.

Next, we lay out the attributes and path leading to the relationship between complete minimal statistics and the MLE.

Completeness

Let f(tθ)f(t|θ) be a family of pdfs or pmfs for a statistic T(X)T(X). The family of probability distributions is called complete if for every measurable, real-valued function g,Eθ(g(T))=0g, E_{\theta}(g(T)) = 0 for all θΩ\theta ∈ Ω implies g(T)=0g(T) = 0 with respect to θ\theta, that is, Pθ(g(T)=0)=1P_{\theta}(g(T) = 0) = 1 for all θ\theta. The statistic TT is boundedly complete if gg is bounded.

In simple words, it means a probability distribution is complete if the probability of a statistic T(X)T(X) from an observed sample X=X1,,XnX = X_1,\ldots, X_n in the distribution is always non-zero.

It becomes clearer by considering a discrete case. In this case, completeness means Eθ(g(T))=g(T)Pθ(T=t)=0E_{\theta}(g(T)) = \sum g(T)P_{\theta}(T = t) = 0 implies g(T)=0g(T) = 0 because by definition Pθ(T=t)P_{\theta}(T = t) is non-zero.

For example, suppose X1,,XnX_1, \ldots , X_n is observed from a normal distribution N(μ,1)N(μ, 1), and there is a statistic T(X)=XiT(X) = \sum X_i. Then, the Pμ(T(X)=0)P_μ(T(X) = 0) is not equal to 00 for all μμ. Therefore, Eμ(g(T))=g(T)Pμ(T)=0E_μ(g(T)) = \int g(T)P_μ(T) = 0 implies g(T)=0g(T) = 0 for all μμ. Therefore, T=XiT = \sum X_i is complete.

This is an important property because it confirms that a statistic TT, if complete, will span the whole sample space. The statistic will contain some information from every observed sample XiX_i ...