You might be familiar with the growing excitement around machine learning and its various applications. Amidst this frenzy, one algorithm stands out for its simplicity and effectiveness in classification tasks: the Naïve Bayes classifier. Its versatility makes it applicable in numerous real-world scenarios, including the following:
Spam email detection: Based on the presence of specific words or phrases, Naïve Bayes is used to separate spam emails from valid ones.
Sentiment analysis: This is a technique used in natural language processing applications, such as customer reviews or social media posts, to classify text into positive, negative, or neutral attitudes.
Medical diagnosis: Based on test findings and patient symptoms, Naïve Bayes is used to forecast the likelihood that a specific disease is present.
Recommendation systems: These are used in recommendation engines to forecast user preferences and make pertinent product or service recommendations in response to user activity.
Document classification: News articles, academic papers, and legal documents are among the specified categories that can be classified using Naïve Bayes.
As seen from the examples above, the Naïve Bayes algorithm can work well with textual data as well as structured (or tabular) data. In this blog, our focus will be on structured data. We will see how Naïve Bayes works, along with exploring some of its advantages and disadvantages.
Before delving further, let’s first take a look at the Bayes’ theorem, upon which the Naïve Bayes algorithm is based.
Bayes’ theorem is a key idea in probability theory and statistics that explains how to update the probability of a hypothesis (or event) in light of fresh information or data.
Bayes’ theorem mathematically expresses this relationship between event 
Here:
Note: Conditional probability is defined as the probability of an event (
) occurring, given that another event ( ) has already occurred. 
Evidence probability, or 
Let’s explain Bayes’ theorem using a coin toss example.
Assume you have two coins, one is a fair coin (), and the other is a biased coin () with a higher probability of landing on tails. You randomly pick one of the coins and toss it. Now, you want to find the probability that the coin is the biased one (), given that it landed on tails.
Let’s define it as follows:
: The event that coin is chosen.
: The event that coin is chosen.
: The event that the coin lands on tails.
We’ll assume the following probabilities:
(the prior probability of choosing coin ).
(the prior probability of choosing coin ).
(the conditional probability of getting tails with coin ).
(the conditional probability of getting tails with coin ).
Using Bayes’ Theorem, we derive the following equation:
Note: We can calculate as follows:
We have all the values that can be plugged into the above equation. Therefore,
Now that we have a brief understanding of Bayes’ theorem, let’s take a look at the Naïve Bayes algorithm—a probabilistic algorithm that is based on the assumption that features are independent of one another. The assumption of independence among features means that the occurrence or value of one feature does not affect or depend on the occurrence or value of another feature. However, this assumption might not hold true in real-world applications.
Assume we have 
is maximum.
By applying the Bayes’ theorem, we get:
The class prior probability is 
The denominator 
Therefore, we only need to calculate:
This can be written as:
Note: The above calculation is made significantly easier due to the Naïve Bayes’ assumption that all features are independent of each other.
Now, how do we calculate 
This can also be easily calculated by looking at the dataset.
Finally, we have all the details. Therefore, to determine the most likely class for the test instance given a test example 
Let’s now take a look at the algorithm step by step followed by a working example.
Calculate prior probabilities: Calculate the prior probabilities of each class in the given dataset.
Calculate likelihoods: For each feature and each class, calculate the likelihood of observing a specific feature value given the class. This involves counting the occurrences of feature values for each class in the training data.
Calculate posterior probabilities: For a new, unseen data point, calculate the posterior probabilities for each class using Bayes’ theorem. The posterior probability of a class given the features is proportional to the prior probability of the class and the product of the likelihoods for each feature.
Make a prediction: As the projected class for the new data point, select the class with the highest posterior probability. This is the class that the algorithm believes is most likely given the observed features.
Let’s assume that we have the following dataset.
| Fever | Fatigue | Cough | Disease | 
| Yes | No | No | Influenza | 
| No | Yes | No | Common cold | 
| No | No | Yes | Influenza | 
| Yes | No | Yes | Other | 
| No | No | No | Influenza | 
| Yes | Yes | Yes | Common cold | 
| Yes | No | Yes | Influenza | 
| No | Yes | No | Other | 
| Yes | Yes | No | Influenza | 
| No | No | Yes | Common cold | 
Step 1: Calculate prior probabilities
Based on the given data, we have the following prior probabilities:
Step 2: Calculate likelihoods
For each feature, we calculate the likelihood of observing “Yes” or “No” for each class as follows:
Step 3: Calculate posterior probabilities
Assume we have a new data point: 
We need to calculate the following three probabilities:
Here is the posterior probability for 
Here is the posterior probability for  
Here is the posterior probability for  
Step 4: Make a prediction
Because the posterior probability for 
Let’s now take a look at a few advantages and disadvantages of using the Naïve Bayes algorithm.
Advantages
Disadvantages
All attributes are assumed to be categorical by Naïve Bayes, and discretization of numerical attributes is required. It is possible that in the training set, a specific attribute value will never occur with a class. Calculating probabilities for such attributes will result in 0. To avoid that, we introduce a smoothing factor, which is usually a small constant value. The assumption that the features are independent of one another is not valid in most cases. Therefore, accuracy can be very low when the assumption is seriously violated.
This blog has provided a quick introduction to the Naïve Bayes algorithm. We started with a brief introduction to Bayes’ theorem, mentioned some use cases, and explored the advantages and disadvantages of using the Naïve Bayes algorithm for classification. We also demonstrated our example and showed each step.
However, your journey does not end here! To create models that are more reliable and accurate, you might want to experiment with various approaches and frameworks. We recommend that you look into the following courses offered by Educative:
A Practical Guide to Machine Learning with Python
This course teaches you how to code basic machine learning models. The content is designed for beginners with general knowledge of machine learning, including common algorithms such as linear regression, logistic regression, SVM, KNN, decision trees, and more. If you need a refresher, we have summarized key concepts from machine learning, and there are overviews of specific algorithms dispersed throughout the course.
Machine Learning with Python Libraries
Machine learning is used for software applications that help them generate more accurate predictions. It is a type of artificial intelligence operating worldwide and offers high-paying careers. This path will provide a hands-on guide on multiple Python libraries that play an important role in machine learning. This path also teaches you about neural networks, PyTorch Tensor, PyCaret, and GAN. By the end of this module, you’ll have hands-on experience in using Python libraries to automate your applications.
Mastering Machine Learning Theory and Practice
The machine learning field is rapidly advancing today due to the availability of large datasets and the ability to process big data efficiently. Moreover, several new techniques have produced groundbreaking results for standard machine learning problems. This course provides a detailed description of different machine learning algorithms and techniques, including regression, deep learning, reinforcement learning, Bayes nets, support vector machines (SVMs), and decision trees. The course also offers sufficient mathematical details for a deeper understanding of how different techniques work. An overview of the Python programming language and the fundamental theoretical aspects of ML, including probability theory and optimization, is also included. The course contains several practical coding exercises as well. By the end of the course, you will have a deep understanding of different machine-learning methods and the ability to choose the right method for different applications.