Support Vector Machine

In this lesson, we introduce a very popular model, Support Vector Machine.

What is Support Vector Machine?

Support Vector Machine (SVM) is widely used for classification (SVM also supports regression tasks). In general, SVM finds a hyperplane that separates data points with the greatest amount of margin.

The core idea of SVM is to find a maximum marginal hyperplane that divides the dataset. For a data set with two classes, if they’re linearly separable, then you can find an infinite number of hyperplanes to separate them. The SVM finds only one of these hyperplanes, which is the maximum marginal hyperplane.

As you can see in the image above, the black and white circles can be separated by multiple lines. Both lines H2 and H3 can separate this data set. However, the point closest to the two lines is different, which means the margins of the two lines are different. According to the definition of SVM, what we need is the line with the largest margin. We need H3.

What is support vector? They are data points that are closest to the hyperplane. In a more intuitive sense, they define the lines of margins. That’s the dotted line in the figure above. The red line is the hyperplane that we’ve been talking about, which separates the data points perfectly. A margin is a gap between the two lines on the closest class points.

SVM is a method with a rich theoretical basis and very beautiful mathematical definition and derivation. But we’re not going to cover that in this course; if you’re interested in this topic, you can refer to the wikipedia.

Difference between SVM and logistic regression

You may have already learned logistic regression before. SVM and logistic regression are all classifiers, so what are the differences between them?

  • SVM is a geometrical method. Logistic regression is a statistical approach.
  • The risk of overfitting is less in SVM.
  • SVM works well on unstructured and semi-structured data.
  • Logistic regression works well on large scale data sets.
  • SVM’s performance is pretty good.
  • Theoretically, the SVM only needs to know a few points, so it consumes very little memory.

Linear SVM

In the first demo, we will show a simple demo by a linear hyperplane. SVM has a feature, kernel function, which is used to separate non-linear data. For the kernel function, we will show it in the second demo.

As usual, we skip the data loading and splitting part and jump to building the model. The complete code will be shown later. As the code shows below, we create a LinearSVC from the linear_model module. LinearSVC is Linear Support Vector Classification. You can create an SVC from the SVM module with kernel=linear parameter. They are the same.

from sklearn.svm import LinearSVC

svm = LinearSVC(dual=False)
# train_x and train_y are training samples and labels.
svm.fit(train_x, train_y)

Here is the complete code so you can have a try.

Press + to interact
import sklearn.datasets as datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
import sklearn.metrics as metrics
import matplotlib.pyplot as plt
from sklearn.svm import SVC
import numpy as np
cancer = datasets.load_breast_cancer()
print("The data shape of breast cancer is {}".format(cancer.data.shape))
print("There are {} classes in this dataset".format(cancer.target_names.size))
train_x, test_x, train_y, test_y = train_test_split(cancer.data,
cancer.target,
test_size=0.2,
random_state=42)
print("The first five samples {}".format(train_x[:5]))
print("The first five targets {}".format(train_y[:5]))
svm = LinearSVC(dual=False)
svm.fit(train_x, train_y)
pred_y = svm.predict(test_x)
print("The first five prediction {}".format(pred_y[:5]))
print("The real first five labels {}".format(test_y[:5]))
cm = metrics.confusion_matrix(test_y, pred_y)
print(cm)
  • From line 9 to line 13, the breast cancer dataset is loaded by load_breast_cancer and split into two parts.

  • An SVM object is created at line 21 with dual=False. When the number of samples is larger than the number of features, we prefer dual=False. It is then fit at line 22.

  • In this example, we use confusion_matrix to evaluate the performance of our model at line 28.

Kernel SVM

What should SVM do when the data is not linearly separable? Here we are going to introduce a very powerful feature in SVM, kernel function. In simple terms, the kernel can convert data ...