Summing Up

Let’s sum up the last few lessons:

Suppose we have a distribution of doubles, p, and a function f from double to double. We often want to answer the question “what is the average value of f when it is given samples from p?” This quantity is called the expected value.

The obvious (or “naive”) way to do it is: take a bunch of samples, evaluate the function on those samples, take the average. Easy! However, this can lead to problems if there are “black swans”: values that are rarely sampled, but massively affect the value of the average when running through f. We want to get a good estimate without having to increase the number of samples in our average massively.

Techniques to Estimate Expected Value:

We developed two techniques to estimate the expected value:

Abandon Sampling and Perform Numerical Integral Calculus

Use quadrature to compute two areas: the area under f(x) * p.Weight(x) and the area under p.Weight(x) (which is the normalization constant of p).
Their quotient is an extremely accurate estimate of the expected value.
But we have to know what region to do quadrature over.

Use Importance Sampling

Find a helper distribution q whose weight is large where f(x) * p.Weight(x) bounds a lot of areas.
Use the naïve algorithm to estimate the expected value of x => f(x)*p.Weight(x)/q. Weight(x)from samples of q.
That is proportional to the expected value of f with respect to p.
We gave a technique for estimating the proportionality constant by sampling from q also.

Problems with Importance Sampling

The problem with importance sampling then is finding a good q. We discussed some techniques:

If you know the range, use a uniform distribution over that range.
Stretch and shift p so that the transformed PDF doesn’t have a “black swan”, but the normalization constant is the same.
Use the Metropolis Algorithm to generate a helper PDF from Abs(f * p), though in our experiments this worked poorly.
If we know the range of interest, we can use the VEGAS algorithm. It makes cheap, naïve estimates of the area of subranges, and then uses that information to gradually refine a piecewise-uniform helper PDF that targets spikes and avoid flat areas of f * p.
However, the VEGAS algorithm is complex, and we did not attempt to implement it for this course.

Food for Thought

The question you may have been asking yourself these past few lessons is:

If quadrature is an accurate and cheap way to estimate the expected value of f over samples from p then why are we even considering doing sampling at all? Surely we typically know at least approximately the range over which f * p has some area. What’s the point of all this?

A simple quadrature algorithm just splits up the range into some number — say, a thousand — equally-sized pieces, evaluates f * p on each of them, and takes the average. That sure seems cheaper and easier than all this mucking around with sampling. Have we just been wasting your time these past few lessons? And why has there been so much research and effort put into finding techniques for estimating expected value?

This course’s main purpose is “Fixing Random” because the built-in base class library tools we have in C# for representing probabilities are weak. We’ve approached everything in this course from the perspective of “We want to have an object that represents probabilities in our business domain, and we want to use that object to solve our business problems”.

“What is the expected value of this function given this distribution?” is a very natural question to ask when solving business problems that involve probabilities, and as we’ve seen, you can answer that question by simulating integral calculus through quadrature.

But, as we keep on saying: things equal to the same are equal to each other. Flip the script. Suppose our business domain involves solving integral calculus problems. And suppose there is an integral calculus problem that we cannot efficiently solve with quadrature. What do we do?

We can solve expected value problems with integral calculus techniques such as quadrature.
We can solve expected value problems with sampling techniques.
Things equal to the same are equal to each other.
Therefore we can solve integral calculus problems with sampling techniques.

That is why there has been so much research into expected computing values: the expected value is the area under the function f(x) * p.Weight(x) so if we can compute the expected value by sampling, then we can compute that area and solve the integral calculus problem without doing quadrature!

We said above “if quadrature is accurate and cheap”, but there are many scenarios in which quadrature is not a cheap way to compute an area. (There ...