Naive Bayes
Learn about naive Bayes and building a discriminative model.
Representing the problem
In the previous example, we used 2-dimensional feature vectors to illustrate the classification problems with 2-dimensional plots. However, most machine learning applications work with high-dimensional feature vectors. We will now discuss an important method with generative models, that is often used with high-dimensional data, known as naive Bayes. We will discuss this method with an example of text processing, following an example from Andrew Ng of making a spam filter that classifies email messages as either spam or non-spam emails. To do this, we first need a method to represent the problem in a suitable way. We choose here to represent a text (an email in this situation) as a vocabulary vector.
Note: A vocabulary vector is simply a list of all possible words that we’ll consider.
A text can be represented by a vector with entry of if the word can be found in the text or an entry of if not. This is shown as follows:
Get hands-on with 1200+ tech skills courses.