Fundamentals of Machine Learning: A Pythonic Introduction/

...

Introduction to SVM

Gain an understanding of SVM, and the concepts of signed and unsigned distance.

We'll cover the following...

What is SVM?
Advantages of SVM
Disadvantages of SVM

Support vector machine (SVM) is a popular and powerful supervised learning algorithm for classification and regression problems. It works by finding the best possible boundary between different classes of data points. In this lesson, we’ll cover the basic concepts and principles behind SVMs and see how they can be applied in practice.

What is SVM?

Suppose a person works for a bank, and their job is to decide whether to approve or reject loan applications based on the applicant’s financial history. They have a loan dataset with various features such as credit score, income, and debt-to-income ratio, along with past approval and rejection records. The task is to use SVM to build a predictive model for future loan applications.

First, they map each loan application into a feature space based on its features and label each loan application as either “approved” or “rejected,” which creates two different classes in the dataset. Next, they try to find a decision boundary that will separate the data linearly.

Press + to interact

SVM finds the best hyperplane that separates the two classes, that is, the decision boundary that separates the approved and rejected loan applications. It finds the hyperplane that maximizes the margin, which is the distance between the hyperplane and the closest loan applications from each class. This means they want the hyperplane to be as far away as possible from the closest approved and rejected loan applications.

However, they don’t need to consider every loan application when determining the hyperplane. Only the loan applications that lie closest to the hyperplane, called support vectors, are used to determine its position. This approach makes SVM memory-efficient and allows us to create a model that can handle a large number of features.

Press + to interact

The plot above shows two classifiers that separate the positive and negative classes of a dataset. The blue line represents the SVM classifier, whereas the green line represents the other classifier. The points on the dotted line are called support vectors because they’re the closest to the hyperplane, and the distance between the blue dotted lines is called the margin, which is what we want to maximize in SVM to get the best possible classifier. We can’t say that the green line is a hyperplane of SVM because it doesn’t have a maximum margin.

Note: SVM can be thought of as a generalized linear discriminant with maximum margin.

Signed & unsigned distance

In SVM, the hyperplane is defined by a weight vector $\bold w$ and a bias term $b$ . The hyperplane equation can be written as $\bold w^T\bold x + b = 0$ . Here, $\bold x$ represents a data point, $\bold w$ represents the normal vector to the hyperplane, and $b$ represents the offset of the hyperplane from the origin. The signed distance of a point $\bold x_i$ from the hyperplane can be defined as the distance between $\bold x_i$ and the hyperplane, taking into account the direction of the normal vector. This distance is signed because it can be positive or negative depending on which side of the hyperplane the point is on.

If the normal vector $\bold w$ and the direction of the distance from the hyperplane to the point is pointing in the same direction, then the distance is positive, but if they’re pointing in opposite directions, then the distance is negative. All the points that are above the hyperplane have a positive distance, while all the points that are below the hyperplane have a negative distance, as shown in the figure below.

Note: Unless stated otherwise, we assume the bias parameter as the part of the vector $\bold w$ , and we’ll append $1$ to the feature vectors.

Press + to interact

Course Overview

Supervised Learning

Detect Cyber Intrusion Using Machine Learning

Clustering

Project: Bag of Visual Words

Generalized Linear Regression

Face Recognition Using Kernel Linear Discriminant

Support Vector Machine

Logistic Regression

Ensemble Learning

Early Stage Diabetes Prediction Using Ensemble Learning

Decoding Dimensions: PCA and Autoencoders

Image Reconstruction Using PCA

Image Colorization using Autoencoders

Colorful Face Generation with VAEs

Appendix

Wrapping Up

How to Predict the Traffic Volume Using Machine Learning

Introduction to SVM

What is SVM?

Signed & unsigned distance