Bayesian Statistics
What Is Bayesian Statistics?
What we have learned so far about probability falls into the category of Frequency Statistics. But there is another more powerful form of statistics as well, and it called Bayesian Statistics, sometimes called, Bayesian Inference. Bayesian Statistics is a more general approach to statistics; it describes the probability of an event based on the previous knowledge of the conditions that might be related to the event. It allows us to answer questions like:
- Has this happened before?
- Is it likely, based on my knowledge of the situation, that it will happen?
Let’s look at an example. Ever wonder how a spam filter could be designed?
Say an email containing, “You won the lottery” gets marked as spam. The question is, how can a computer understand that emails containing certain words are likely to be spam? Bayesian Statistics does the magic here!
Spam filtering based on a blacklist would be too restrictive and it would have a high false-negative rate, spam that goes undetected. Bayesian filtering can help by allowing the spam filter to learn from previous instances of spam. As we analyze the words in a message, we can compute its probability of being spam using Bayes’ Theorem. And as the filter gets trained with more and more messages, it updates the probabilities that certain words lead to spam messages. Bayesian Statistics takes into account previous evidence.
Bayesian Statistics is based on the Bayes’ Theorem: This is basically a way of finding a probability when we know certain other probabilities. The magical Bayes’ formula looks like this:
...