Towards Naïve Bayes
Get introduced to Naïve Bayes and calculate the conditional probabilities.
We'll cover the following...
- Introduction
- Loading the raw data
- Towards Naïve Bayes
- Calculating the probability of survival
- Calculating the conditional probability of survival
- Calculating the probability to survive if the passenger was female
- Calculating the conditional probability of females with second-class tickets
- Counting passengers
Introduction
In our first simple variational hybrid quantum-classical binary classification algorithm, which we developed in the previous chapter, we used a parameterized quantum circuit (PQC) that measured a quantum state. While quantum systems bring inherent randomness and allow us to work with probabilities, we did not yet use this characteristic because we determined the resulting probability of measuring either 0 or 1 upfront in a classical program.
In the following two chapters, we go one step further. We create a probabilistic binary classifier that calculates the resulting likelihood inside the PQC. We build a variational hybrid quantum-classical Naïve Bayes classifier. It builds upon Bayes’ Theorem. Starting with an initial prior probability, we update the resulting probability inside the PQC based on the evidence given by the passenger data.
Don’t worry if you’re not familiar with Bayes’ Theorem and the Naïve Bayes classifier. We’ll cover all the basics in this chapter.
We use the Titanic shipwreck data to discover Bayes’ Theorem and the Naïve Bayes classifier with actual data. We load the original data here because it is easier to work with manually.
Loading the raw data
import pandas as pdtrain = pd.read_csv('train.csv')print(train.head())
The output depicts the first five rows and the data in the train
Pandas dataframe.
Note: See the lesson Get and Look on the Dataset for more details on the dataset. for more details on the dataset.
Towards Naïve Bayes
“Did a passenger survive the Titanic shipwreck?”
A probabilistic classifier predicts the label of a thing based on its probability. In order to answer the question above, we need to know what the chance to survive is.
Let’s calculate it. In the following snippet, we create a list of all survivors in line 2. First, we’ll use the Pandas chaining operation, train.Survived
, to access a column. Then, we use the eq()
function from Pandas and chain it to the column. It selects the rows whose values match the provided value (1
for survival).
In line 5, the survival probability is the number of survivors divided by the total number of passengers.
Calculating the probability of survival
# list of all survivorssurvivors = train[train.Survived.eq(1)]# calculate the probabilityprob_survival = len(survivors)/len(train)print('P(Survival) is {:.2f}'.format(prob_survival))
Given our dataset, the probability of survival is . Based on this, we can say the passenger died.
This is a probabilistic classifier already. It’s the predict_death
classifier we created in the lesson Baseline and discussed in the lesson Unmask the Hypocrite Classifier. Even though it’s a hypocrite classifier because it does not consider the individual passenger when predicting survival, this classifier yields a higher precision than a purely random classifier does.
What if the passenger had a second-class ticket? What was this passenger’s probability of surviving?
Let’s take a look. In the following snippet, in line 2, we create a list of passengers with a second-class ticket (train.Pclass.eq(2)
).
We divide the survivors of this subset (secondclass.Survived.eq(1)
) by the total number of passengers with a second-class ticket in line 4.