Home/Blog/Data Science/Naïve Bayes explained

Naïve Bayes explained

13 min read

Mar 04, 2024

content

Overview

Bayes’ theorem

Naïve Bayes

How Naïve Bayes works

The Naïve Bayes algorithm

Example

Pros and cons of using Naïve Bayes

Conclusion and next steps

Become a Software Engineer in Months, Not Years

From your first line of code, to your first day on the job — Educative has you covered. Join 2M+ developers learning in-demand programming skills.

Overview#

You might be familiar with the growing excitement around machine learning and its various applications. Amidst this frenzy, one algorithm stands out for its simplicity and effectiveness in classification tasks: the Naïve Bayes classifier. Its versatility makes it applicable in numerous real-world scenarios, including the following:

Spam email detection: Based on the presence of specific words or phrases, Naïve Bayes is used to separate spam emails from valid ones.
Sentiment analysis: This is a technique used in natural language processing applications, such as customer reviews or social media posts, to classify text into positive, negative, or neutral attitudes.
Medical diagnosis: Based on test findings and patient symptoms, Naïve Bayes is used to forecast the likelihood that a specific disease is present.
Recommendation systems: These are used in recommendation engines to forecast user preferences and make pertinent product or service recommendations in response to user activity.

Document classification: News articles, academic papers, and legal documents are among the specified categories that can be classified using Naïve Bayes.

As seen from the examples above, the Naïve Bayes algorithm can work well with textual data as well as structured (or tabular) data. In this blog, our focus will be on structured data. We will see how Naïve Bayes works, along with exploring some of its advantages and disadvantages.

Bayes’ theorem #

Before delving further, let’s first take a look at the Bayes’ theorem, upon which the Naïve Bayes algorithm is based.

Bayes’ theorem is a key idea in probability theory and statistics that explains how to update the probability of a hypothesis (or event) in light of fresh information or data.

Bayes’ theorem mathematically expresses this relationship between event $A$ and event $B$ as follows:

Here:

$P(A∣B)$ , known as posterior probability, denotes the conditional probability of event $A$ occurring, given that event $B$ has previously occurred.

Note: Conditional probability is defined as the probability of an event ( $A$ ) occurring, given that another event ( $B$ ) has already occurred.

$P(B∣A)$ , also called likelihood, is the conditional probability of event $B$ happening, given that event $A$ has already occurred.
$P(A)$ , also known as prior probability, is the probability of event $A$ occurring based on prior history.
Evidence probability, or $P(B)$ , is the probability of the event happening.

Let’s explain Bayes’ theorem using a coin toss example.

Let’s define it as follows:

$A$ : The event that coin $A$ is chosen.
$B$ : The event that coin $B$ is chosen.
$T$ : The event that the coin lands on tails.

We’ll assume the following probabilities:

$P(A) = 0.5$ (the prior probability of choosing coin $A$ ).
$P(B) = 0.5$ (the prior probability of choosing coin $B$ ).
$P(T∣A) = 0.5$ (the conditional probability of getting tails with coin $A$ ).
$P(T∣B) = 0.7$ (the conditional probability of getting tails with coin $B$ ).

How Naïve Bayes works#

Assume we have $k$ attributes, where $A_1$ through $A_k$ are attributes with distinct features. The class is $C$ and can have multiple distinct values. Let’s further suppose that a test example $d$ with observed attribute values $a_1$ through $a_k$ is provided, and $a_i$ is the value of attribute $A_i$ , where $i=1,\cdots k_i$ . In essence, classification involves calculating the subsequent posterior probability. The prediction is the class $c_j$ such that

The class prior probability is $P(C=c_j)$ , which is easily determined. The likelihood that a class will exist without taking any attributes into account is known as its prior probability. It is computed as the percentage of occurrences that are associated with that particular class in the existing dataset.

The denominator ${P(A_1 = a_1, A_2 = a_2, …, A_k = a_k)}$ can be ignored because it will be the same for every probability.

Therefore, we only need to calculate:

The Naïve Bayes algorithm#

Let’s now take a look at the algorithm step by step followed by a working example.

Calculate prior probabilities: Calculate the prior probabilities of each class in the given dataset.
Calculate likelihoods: For each feature and each class, calculate the likelihood of observing a specific feature value given the class. This involves counting the occurrences of feature values for each class in the training data.
Calculate posterior probabilities: For a new, unseen data point, calculate the posterior probabilities for each class using Bayes’ theorem. The posterior probability of a class given the features is proportional to the prior probability of the class and the product of the likelihoods for each feature.
Make a prediction: As the projected class for the new data point, select the class with the highest posterior probability. This is the class that the algorithm believes is most likely given the observed features.

Step 1: Calculate prior probabilities

Based on the given data, we have the following prior probabilities:

$P(\text{Disease} = \text{Influenza)} = 5/10$
$P(\text{Disease} = \text{Common cold)} = 3/10$
$P(\text{Disease} = \text{Other}) = 2/10$

Step 2: Calculate likelihoods

For each feature, we calculate the likelihood of observing “Yes” or “No” for each class as follows:

$P(\text{Fever = Yes | Disease = Influenza}) = 3/5$
$P(\text{Fever = Yes | Disease = Common cold}) = 1/3$
$P(\text{Fever = Yes | Disease = Other}) = 1/2$
$P(F\text{ever = No | Disease = Influenza}) = 2/5$
$P(\text{Fever = No | Disease = Common cold}) = 2/3$
$P(\text{Fever = No | Disease = Other}) = 1/2$
$P(\text{Fatigue = Yes | Disease = Influenza}) = 1/5$
$P(\text{Fatigue = Yes | Disease = Common cold}) = 2/3$
$P(\text{Fatigue = Yes | Disease = Other}) = 1/2$
$P(\text{Fatigue = No | Disease = Influenza}) = 4/5$ = 4/5
$P(\text{Fatigue = No | Disease = Common cold}) = 1/3$
$P(\text{Fatigue = No | Disease = Other}) = 1/2$
$P(\text{Cough = Yes | Disease = Influenza}) = 2/5$
$P(\text{Cough = Yes | Disease = Common cold}) = 2/3$
$P(\text{Cough = Yes | Disease = Other}) = 1/2$
$P(\text{Cough = No | Disease = Influenza}) = 3/5$
$P(\text{Cough = No | Disease = Common cold}) = 1/3$
$P(\text{Cough = No | Disease = Other}) = 1/2$

Step 3: Calculate posterior probabilities

Assume we have a new data point: $d = \text{{Fever = Yes, Fatigue = No, Cough = Yes}}.$

We need to calculate the following three probabilities:

$P(\text{Disease = Influenza | Fever = Yes, Fatigue = No, Cough = Yes})$
$P(\text{Disease = Common cold | Fever = Yes, Fatigue = No, Cough = Yes})$
$P(\text{Disease = Other | Fever = Yes, Fatigue = No, Cough = Yes})$

Here is the posterior probability for $\text{Disease = Influenza}$ :

$P(\text{Disease = Influenza | Fever = Yes, Fatigue = No, Cough = Yes})$ $= P(\text{Disease = Influenza})$ * $P(\text{Fever = Yes | Disease = Influenza)}$ * $P(\text{Fatigue = No | Disease = Influenza})$ * $P(\text{Cough = Yes | Disease = Influenza})$

$= 5/10 * 3/5 * 4/5 * 2/5 = 0.096.$

Here is the posterior probability for $\text{Disease = Common cold}$ :

$P(\text{Disease = Common cold | Fever = Yes, Fatigue = No, Cough = Yes})$ $= P(\text{Disease = Common cold})$ * $P(\text{Fever = Yes | Disease = Common cold})$ * $P(\text{Fatigue = No | Disease = Common cold})$ * $P(\text{Cough = Yes | Disease = Common cold})$

$= 3/10 * 1/3 * 1/3 * 2/3 = 0.022.$

Here is the posterior probability for $\text{Disease = Other}$ :

$P(\text{Disease = Other | Fever = Yes, Fatigue = No, Cough = Yes})$ $= P(\text{Disease = Other})$ * $P(\text{Fever = Yes | Disease = Other})$ * $P(\text{Fatigue = No | Disease = Other})$ * $P(\text{Cough = Yes | Disease = Other})$

$= 2/10 * 1/2 * 1/2 * 1/2 = 0.025.$

Step 4: Make a prediction

Because the posterior probability for $Disease = Influenza$ is maximum, we can assign the new data point d belonging to influenza.

Pros and cons of using Naïve Bayes#

Let’s now take a look at a few advantages and disadvantages of using the Naïve Bayes algorithm.

Advantages

All attributes are assumed to be categorical by Naïve Bayes, and discretization of numerical attributes is required. It is possible that in the training set, a specific attribute value will never occur with a class. Calculating probabilities for such attributes will result in 0. To avoid that, we introduce a smoothing factor, which is usually a small constant value. The assumption that the features are independent of one another is not valid in most cases. Therefore, accuracy can be very low when the assumption is seriously violated.

A Practical Guide to Machine Learning with Python

A Practical Guide to Machine Learning with Python

This course teaches you how to code basic machine learning models. The content is designed for beginners with general knowledge of machine learning, including common algorithms such as linear regression, logistic regression, SVM, KNN, decision trees, and more. If you need a refresher, we have summarized key concepts from machine learning, and there are overviews of specific algorithms dispersed throughout the course.

72hrs 30mins

Beginner

108 Playgrounds

12 Quizzes

Machine Learning with Python Libraries

Machine learning is used for software applications that help them generate more accurate predictions. It is a type of artificial intelligence operating worldwide and offers high-paying careers. This path will provide a hands-on guide on multiple Python libraries that play an important role in machine learning. This path also teaches you about neural networks, PyTorch Tensor, PyCaret, and GAN. By the end of this module, you’ll have hands-on experience in using Python libraries to automate your applications.

53hrs

Beginner

56 Challenges

62 Quizzes

Mastering Machine Learning Theory and Practice

The machine learning field is rapidly advancing today due to the availability of large datasets and the ability to process big data efficiently. Moreover, several new techniques have produced groundbreaking results for standard machine learning problems. This course provides a detailed description of different machine learning algorithms and techniques, including regression, deep learning, reinforcement learning, Bayes nets, support vector machines (SVMs), and decision trees. The course also offers sufficient mathematical details for a deeper understanding of how different techniques work. An overview of the Python programming language and the fundamental theoretical aspects of ML, including probability theory and optimization, is also included. The course contains several practical coding exercises as well. By the end of the course, you will have a deep understanding of different machine-learning methods and the ability to choose the right method for different applications.

36hrs

Beginner

109 Playgrounds

10 Quizzes

Written By:

Kamran Lodhi

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

Naïve Bayes explained

Overview#

Bayes’ theorem #

Naïve Bayes#

How Naïve Bayes works#

The Naïve Bayes algorithm#

Example#

Pros and cons of using Naïve Bayes#

Conclusion and next steps#

Fever	Fatigue	Cough	Disease
Yes	No	No	Influenza
No	Yes	No	Common cold
No	No	Yes	Influenza
Yes	No	Yes	Other
No	No	No	Influenza
Yes	Yes	Yes	Common cold
Yes	No	Yes	Influenza
No	Yes	No	Other
Yes	Yes	No	Influenza
No	No	Yes	Common cold