...

/

Estimating a Single Data Point

Estimating a Single Data Point

Learn how we can estimate a single data point.

Before we calculate Norm, let’s look at a straightforward case first. We’ll apply a variational method to approximate the distribution of a hidden variable analytically. Let’s say we have two binary variables, A and B. We know they’re not independent. A is the parent node and B is the child node whose conditional probability we try to estimate.

The following figure depicts this Bayesian network.

The dataset with missing values

The following listing specifies the data in Python.

Press + to interact
data = [
(1, 1), (1, 1), (0, 0), (0, 0), (0, 0), (0, None), (0, 1), (1, 0)
]

The data have to be missing at random for the methods we apply to be helpful. The data must not be missing for a reason. For instance, if we only know the age of survivors because they were asked after their rescue, and not the age of passengers who died aboard, the data would not be missing at random. We would have biased data and wouldn’t be able to reliably infer the age of victims from the data. However, if we took the age from a passenger list but couldn’t read the ages of some because of bad handwriting, we can assume the data to be missing at random.

Before we start filling in the missing value, we need an evaluation function. We need some measures to tell us how well we do.

Let’s use a likelihood function. Likelihood functions represent the likelihood of a model to result in the observed data. There are two well-known types of likelihood functions.

The maximum likelihood function is defined as the product of all probability estimations.

L(θ)=i=1nfi(yiθ)L(\theta)=\prod_{i=1}^{n} f_i(y_i|\theta)

The log-likelihood function takes the natural logarithm of the estimations and sums them.

F(θ)=i=1nlnfi(yiθ)F(\theta)=\sum_{i=1}^{n} \ln{f_i(y_i|\theta)} ...