...

/

Probability Distribution Function

Probability Distribution Function

In this lesson, we will begin with an overview of probability distribution functions and then move on to discussing continuous distributions.

We’ve been mostly looking at small, discrete distributions in this course, but we started this series by looking at continuous distributions. Now that we have some understanding of how to solve probability problems on simple discrete distributions and Markov processes, let’s go back to continuous distributions and see if we can apply some of these learnings to them.


What is a Probability Distribution Function?

Let’s start with the basics. What exactly do we mean by a probability distribution function? So far in this course, we’ve mostly looked at the discrete analog, a non-normalized probability mass function. That is, for an IDiscreteDistribution<T> we have a weight function that gives us a value for each T. The probability of sampling a particular T is the weight divided by the total weight of all Ts.

Of course, had we decided to go with double weights instead of integers, we could have made a normalized probability mass function: that is, the “weight” of each particular T is automatically divided by the total weight, and we get the probability out. In this scenario, the total weight adds up to 1.01.0.

We know from our exploration of the weighted integer distribution that we can think of our probability distributions as making a rectangle where various sub-rectangles are associated with particular values; we then “throw a dart” at the rectangle to sample from the distribution; where it lands gives us the sample.

We will be abbreviating “Probability Distribution Function” as PDF for the rest of the course.

Continuous Distribution

Continuous distributions can be thought of in much the same way. Suppose we have a function from double to double, always non-negative, such that the total area under the curve is 1.01.0. Here are some examples:

double PDF1(double x) => x < 0.0 | x >= 1.0 ? 0.0 : 1.0;
double PDF2(double x) => x < 0.0 | x >= 1.0 ? 0.0 : 2 * x;
double PDF3(double x) => Exp(–(x * x)) / Sqrt(PI);
widget

What is the meaning of these as probability distributions? Plainly the “higher” the function, the “more likely” any particular value is, but what does it even mean to say that in our PDF2 and PDF3 distributions, that 0.50.5 is “less likely” than 0.60.6 but they are “equally likely” in our PDF1 distribution?

One way to think of it is that again, we “throw a dart” at the area under the curve. Given any subset of that area, the probability of the dart landing inside it is proportional to the area of the subset, and the value sampled is the xx coordinate of the dart.

We can make this a little bit more formal by restricting our areas to little rectangles:

  1. Take a value, say 0.50.5.
  2. Now take a tiny offset, call it ϵ\epsilon. Doesn’t matter what it is, so long as it is “pretty small”.
  3. The probability of getting a sample value between 0.50.5 and 0.5+ϵ0.5 + \epsilon is PDF(0.5)×
...