Gaussian Naïve Bayes

Get introduced to the concept of Gaussian Naïve Bayes and calculate the mean and the standard deviation of the passenger age.

So far, we’ve considered categorical data. There are two genders in our dataset. There are three classes of tickets. We can treat features with distinct values as such.

But what about features such as the age of the passenger? One way is to transform numerical features into their categorical counterparts. The question is how and where to separate the categories from each other. For instance, a 29-year-old passenger has a different age than a 30-year-old passenger. But both are somewhat similar to each other when compared to a 39-year-old passenger. But when we split by tens, we would put the 30-year-old and the 39-year-old passengers together and separate them from the 29-year-old passenger.

The other option we have is to treat numerical features as continuous distributions. A continuous distribution cannot be expressed in tabular form. Instead, we use an equation to describe a continuous probability distribution. A common practice is to assume normal Gaussian distributions for numerical variables. The following equation denotes the general form of the Gaussian density function:

P(x)=1σ2πe12(xμσ)2P(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}\sigma)^2}

The parameter μx\mu_{x} is the mean of the evidence’s probability distribution. The parameter σx\sigma_x is its standard deviation.

The following image depicts such a distribution.

Get hands-on with 1400+ tech skills courses.