Complete statistics

The many choices with minimal sufficient statistics sometimes confuse a selection. This section introduces complete statistics, which narrows the pooling statistic choice to only the maximum likelihood estimator (MLE) of the feature map distribution.

A complete statistic is a bridge between minimal sufficient statistics and MLEsMaximum likelihood estimators. MLEs derived from complete minimal statistics have the essential attributes of unbiasedness and minimum variance along with the minimality and completeness properties. MLEs, therefore, become the natural choice for pooling. This removes most of the ambiguity around pooling statistic selection.

Next, we lay out the attributes and path leading to the relationship between complete minimal statistics and the MLE.

Completeness

Let $f(t|θ)$ be a family of pdfs or pmfs for a statistic $T(X)$ . The family of probability distributions is called complete if for every measurable, real-valued function $g, E_{\theta}(g(T)) = 0$ for all $\theta ∈ Ω$ implies $g(T) = 0$ with respect to $\theta$ , that is, $P_{\theta}(g(T) = 0) = 1$ for all $\theta$ . The statistic $T$ is boundedly complete if $g$ is bounded.

In simple words, it means a probability distribution is complete if the probability of a statistic $T(X)$ from an observed sample $X = X_1,\ldots, X_n$ in the distribution is always non-zero.

It becomes clearer by considering a discrete case. In this case, completeness means $E_{\theta}(g(T)) = \sum g(T)P_{\theta}(T = t) = 0$ implies $g(T) = 0$ because by definition $P_{\theta}(T = t)$ is non-zero.

For example, suppose $X_1, \ldots , X_n$ is observed from a normal distribution $N(μ, 1)$ , and there is a statistic $T(X) = \sum X_i$ . Then, the $P_μ(T(X) = 0)$ is not equal to $0$ for all $μ$ . Therefore, $E_μ(g(T)) = \int g(T)P_μ(T) = 0$ implies $g(T) = 0$ for all $μ$ . Therefore, $T = \sum X_i$ is complete.

This is an important property because it confirms that a statistic $T$ , if complete, will span the whole sample space. The statistic will contain some information from every observed sample $X_i$ of the distribution for any parameter $\theta$ . Therefore, the statistic is called complete.

Note: A complete statistic contains some information about every observation from a distribution.

The importance of the completeness property is understood better by differentiating it with a sufficient statistic.

A minimal sufficient statistic contains all the information about $\theta$ ; it does not necessarily span the whole sample space.

For example,suppose $X_1,\ldots,X_n$ is $iid$ $Uniform(−\theta,\theta)$ then,

T(X)= (X_{(1)}, X_{(n)})

where $X_{(1)} = \text{min}_iX_i$ and $X_{(n)} = \text{max}_iX_i$ is a sufficient statistic. But it is not complete because $E(X_{(n)} − X_{(1)}) = c$ , where $c$ is a constant independent of $\theta$ . Therefore, we can define $g(T) = X_{(n)} − X_{(1)} − c$ but $E(X_{(n)} − X_{(1)} − c) = 0$ ...

Getting Started

Rare Event Prediction

Multi-Layer Perceptrons (MLPs)

Long Short-Term Memory (LSTM) Networks

Convolutional Neural Networks (CNNs)

Autoencoders

Conclusion

Maximizing Efficiency with Complete Statistics

Complete statistics

Completeness