Bayes' Theorem

Get introduced to the concept of Bayes' Theorem and calculate the posterior probability.

We'll cover the following

Bayes’ Theorem describes a way of finding a conditional probability when we know certain other probabilities. The following equation mathematically denotes Bayes’ Theorem:

P(HypothesisEvidence)=P(Hypothesis)P(EvidenceHypothesis)P(Evidence)P(Hypothesis|Evidence)=P(Hypothesis)\cdot\frac{P(Evidence|Hypothesis)}{P(Evidence)}

Bayes’ Theorem says we can calculate the posterior probability from a prior probability and some evidence-related modifier.

The posterior denotes what we believe about the HypothesisHypothesis after gathering the new information about the EvidenceEvidence. It is a conditional probability such as we discussed above. The prior probability denotes what we believed about HypothesisHypothesis before we gathered the new information. It is the overall probability of our HypothesisHypothesis.

The modifier of the new information denotes the relative change of our belief about HypothesisHypothesis caused by the EvidenceEvidence.

This modifier is the quotient of the backward probability (P(EvidenceHypothesis)P(Evidence|Hypothesis)) and the probability of the new piece of information (P(Evidence)P(Evidence)) The backward probability, which is the numerator of the modifier, answers the question of what the probability of observing this evidence in a world where our hypothesis could be true. The denominator is the probability of observing the evidence on its own.

When we see the evidence often in a world where the hypothesis is true but rarely on its own, it seems to support the hypothesis. On the contrary, if we see the evidence everywhere but don’t see it in a world where the hypothesis is true, then the evidence opposes the hypothesis.

The farther the modifier is away from 1, the more it changes the probability. A modifier of precisely 1 would not change the probability at all. Let’s define the value of the informativeness as the modifier’s distance to 1.

Informativeness=P(EvidenceHypothesis)P(Evidence)1Informativeness=|\frac{P(Evidence|Hypothesis)}{P(Evidence)}-1|

If we have one hypothesis HH and multiple pieces of evidence E1,E2E_1, E_2,…, EnE_n, then we have nn modifiers M1,M2,...,MnM_1, M_2, ...,M_n:

P(HE1,E2,...,En)=P(E1H)P(E1).P(E2H)P(E2)....P(EnH)P(En).P(H)P(H|E_1,E_2,...,E_n)=\frac{P(E_1|H)}{P(E_1)}.\frac{P(E_2|H)}{P(E_2)}....\frac{P(E_n|H)}{P(E_n)}.P(H)

What does that mean in practice?

Our HypothesisHypothesis is a passenger who survived the Titanic shipwreck. We have two pieces of evidence FemaleFemale and SecondClassSecondClass.

  • P(Survived)P(Survived) is the overall probability of a passenger surviving.
  • P(Female)P(Female) is the probability of a passenger to be female,
  • and P(SecondClass)P(SecondClass) is the probability of a passenger holding a second-class ticket.
  • P(FemaleSurvived)P(Female|Survived) denotes how likely a passenger who survived is female.
  • And P(SecondClasSurvived)P(SecondClas|Survived) denotes how likely a passenger who survived had a second-class ticket.

The following equation depicts how to calculate the probability of a female passenger with a second class ticket to survive:

P(SurvivedSecCl,Female)=P(SecClSurvived)P(SecCl)P(FemaleSurvived)P(Female)P(Survived)P(Survived|SecCl,Female)=\frac{P(SecCl|Survived)}{P(SecCl)}\cdot\frac{P(Female|Survived)}{P(Female)}\cdot P(Survived)

Let’s have a look at the Python code.

Get hands-on with 1400+ tech skills courses.