Probability Distribution Function
In this lesson, we will begin with an overview of probability distribution functions and then move on to discussing continuous distributions.
We'll cover the following...
We’ve been mostly looking at small, discrete distributions in this course, but we started this series by looking at continuous distributions. Now that we have some understanding of how to solve probability problems on simple discrete distributions and Markov processes, let’s go back to continuous distributions and see if we can apply some of these learnings to them.
What is a Probability Distribution Function?
Let’s start with the basics. What exactly do we mean by a probability distribution function? So far in this course, we’ve mostly looked at the discrete analog, a non-normalized probability mass function. That is, for an IDiscreteDistribution<T>
we have a weight function that gives us a value for each T
. The probability of sampling a particular T
is the weight divided by the total weight of all T
s.
Of course, had we decided to go with double weights instead of integers, we could have made a normalized probability mass function: that is, the “weight” of each particular T
is automatically divided by the total weight, and we get the probability out. In this scenario, the total weight adds up to .
We know from our exploration of the weighted integer distribution that we can think of our probability distributions as making a rectangle where various sub-rectangles are associated with particular values; we then “throw a dart” at the rectangle to sample from the distribution; where it lands gives us the sample.
We will be abbreviating “Probability Distribution Function” as PDF for the rest of the course.
Continuous Distribution
Continuous distributions can be thought of in much the same way. Suppose we have a function from double to double, always non-negative, such that the total area under the curve is . Here are some examples:
double PDF1(double x) => x < 0.0 | x >= 1.0 ? 0.0 : 1.0;
double PDF2(double x) => x < 0.0 | x >= 1.0 ? 0.0 : 2 * x;
double PDF3(double x) => Exp(–(x * x)) / Sqrt(PI);
What is the meaning of these as probability distributions? Plainly the “higher” the function, the “more likely” any particular value is, but what does it even mean to say that in our PDF2
and PDF3
distributions, that is “less likely” than but they are “equally likely” in our PDF1
distribution?
One way to think of it is that again, we “throw a dart” at the area under the curve. Given any subset of that area, the probability of the dart landing inside it is proportional to the area of the subset, and the value sampled is the coordinate of the dart.
We can make this a little bit more formal by restricting our areas to little rectangles:
- Take a value, say .
- Now take a tiny offset, call it . Doesn’t matter what it is, so long as it is “pretty small”.
- The probability of getting a sample value between and is