Pre-Processing

Learn how we can calculate the ticket class modifier, the gender modifier, and pre-processing.

The pre-processing covers the calculation of the modifiers. We start with the ticket class.

Calculating the ticket class modifier

Press + to interact
# get the modifier given the passenger's pclass
def get_modifier_pclass(pclass):
# number of passengers with the same pclass
cnt_surv_pclass = len(survivors[survivors.Pclass.eq(pclass)])
# backward probability
p_cl_surv = cnt_surv_pclass/cnt_survivors
# probability of the evidence
p_cl = len(train[train.Pclass.eq(pclass)])/cnt_all
return p_cl_surv/p_cl

We define a function that takes the passenger’s pclass as input. The Pclass column in our dataset is the ticket class (1 = 1st, 2 = 2nd, 3 = 3rd).

We calculate the backward probability P(PclassSurvived)P(Pclass∣Survived) by dividing the passengers who survived having the given ticket class (cnt_surv_pclass) in line 4 by all survivors (cnt_survivors) in line 7. Then, we calculate the probability of a passenger owning the given ticket class. The number of passengers with the given ticket class is divided by the total number of passengers in line 10.

The modifier is the evidence’s backward probability divided by the likelihood to see the evidence. For the given ticket class, the modifier is mPclass=P(PclassSurvived)P(Pclass)m_{Pclass}=\frac{P(Pclass|Survived)}{P(Pclass)} ...