Non-uniform Discrete Distribution
In this lesson, we are going to extend the Bernoulli distribution to more than two possible outcomes, and discuss some mechanisms for efficiently implementing the Sample() method on this distribution.
A few lessons back we made a version of the Bernoulli distribution that took integer “odds”. That is, the distribution randomly produces either zero or one, and we can say “produce on average zeros for every ones”.
Extending Bernoulli’s Distribution
An obvious extension of this would be to produce , , or randomly, similarly with some ratios: say, average zeros, ones, twos out of each samples in the long run. Of course, there’s no need to stop at three outputs; basically, we want the distribution of any number from to , with any integer ratios between them. This is a non-uniform discrete distribution (also called a “categorical distribution”); if a Bernoulli distribution is flipping an unfair coin, this is rolling an unfair -sided die.
Non-uniform Discrete Distribution
We’ve seen that we can do that by making an array with zeros, ones and twos, and then calling ToUniform
on that, but what if we wanted weights that were or something? It seems wasteful to make an array with two thousand elements just to solve this problem.
Let’s try to write the code to solve this problem without thinking too hard about it, and see where we get stuck:
public sealed class WeightedInteger : IDiscreteDistribution<int>
{
private readonly List<int> weights;
public static IDiscreteDistribution<int> Distribution(params int[] weights) => Distribution((IEnumerable<int>)weights);
public static IDiscreteDistribution<int> Distribution(IEnumerable<int> weights)
{
List<int> w = weights.ToList();
if (w.Any(x => x < 0) || !w.Any(x => x > 0))
throw new ArgumentException();
if (w.Count == 1)
return Singleton<int>.Distribution(0);
if (w.Count == 2)
return Bernoulli.Distribution(w[0], w[1]);
return new WeightedInteger(w);
}
private WeightedInteger(List<int> weights)
{
this.weights = weights;
}
public IEnumerable<int> Support() => Enumerable.Range(0, weights.Count).Where(x => weights[x] != 0);
public int Weight(int i) => 0 <= i && i < weights.Count ? weights[i] : 0
}
No surprises here! This is all straightforward code like we’ve seen before.
Exercise: There are a few places where we could be a little smarter; for instance, if all the weights are zero except one, then we could make a singleton distribution for that case as well. If the last weight is zero, we can discard that weight. Can you give it a shot?
Implementing Sample
The interesting question is: how do we implement Sample
? It is by no means obvious. Over the next few lessons, we’ll look at several techniques for solving this problem. In this lesson, we will learn the “inverse” technique.
Inverse Technique
How do you generate a non-uniform random number when given a standard uniform distribution? In brief:
- Get the probability distribution function.
- Integrate it to get the cumulative distribution — that is the function where
c(x)
is the probability that a sample is less than . - The cumulative distribution is monotone nondecreasing and always between and , so it has an inverse which we call the quantile function. Find it.
- If you sample from a standard continuous uniform distribution and then apply the quantile function to the result, the sample will conform to the desired distribution.
Read up more on this in the lesson on generating non-uniform data.
This technique works well with discrete distributions. (For discrete quantities we call the “probability distribution” function the “probability mass” function). In our case, we can simply treat the mass function as a non-normalized distribution. Here’s the non-normalized mass function for weights .
Get hands-on with 1200+ tech skills courses.