Support Vector Machine
In this lesson, we introduce a very popular model, Support Vector Machine.
We'll cover the following...
What is Support Vector Machine?
Support Vector Machine (SVM
) is widely used for classification (SVM
also supports regression tasks). In general, SVM
finds a hyperplane that separates data points with the greatest amount of margin.
The core idea of SVM
is to find a maximum marginal hyperplane that divides the dataset. For a data set with two classes, if they’re linearly separable, then you can find an infinite number of hyperplanes to separate them. The SVM
finds only one of these hyperplanes, which is the maximum marginal hyperplane.
As you can see in the image above, the black and white circles can be separated by multiple lines. Both lines H2
and H3
can separate this data set. However, the point closest to the two lines is different, which means the margins
of the two lines are different. According to the definition of SVM
, what we need is the line with the largest margin. We need H3
.
What is support vector
? They are data points that are closest to the hyperplane. In a more intuitive sense, they define the lines of margins. That’s the dotted line in the figure above. The red line is the hyperplane
that we’ve been talking about, which separates the data points perfectly. A margin
is a gap between the two lines on the closest class points.
SVM
is a method with a rich theoretical basis and very beautiful mathematical definition and derivation. But we’re not going to cover that in this course; if you’re interested in this topic, you can refer to the wikipedia.
Difference between SVM and logistic regression
You may have already learned logistic regression
before. SVM
and logistic regression
are all classifiers, so what are the differences between them?
- SVM is a geometrical method. Logistic regression is a statistical approach.
- The risk of overfitting is less in SVM.
- SVM works well on unstructured and semi-structured data.
- Logistic regression works well on large scale data sets.
- SVM’s performance is pretty good.
- Theoretically, the SVM only needs to know a few points, so it consumes very little memory.
Linear SVM
In the first demo, we will show a simple demo by a linear hyperplane. SVM
has a feature, kernel
function, which is used to separate non-linear data. For the kernel
function, we will show it in the second demo.
As usual, we skip the data loading and splitting part and jump to building the model. The complete code will be shown later. As the code shows below, we create a LinearSVC
from the linear_model
module. LinearSVC is Linear Support Vector Classification. You can create an SVC from the SVM
module with kernel=linear
parameter. They are the same.
from sklearn.svm import LinearSVC
svm = LinearSVC(dual=False)
# train_x and train_y are training samples and labels.
svm.fit(train_x, train_y)
Here is the complete code so you can have a try.
import sklearn.datasets as datasetsfrom sklearn.model_selection import train_test_splitfrom sklearn.svm import LinearSVCimport sklearn.metrics as metricsimport matplotlib.pyplot as pltfrom sklearn.svm import SVCimport numpy as npcancer = datasets.load_breast_cancer()print("The data shape of breast cancer is {}".format(cancer.data.shape))print("There are {} classes in this dataset".format(cancer.target_names.size))train_x, test_x, train_y, test_y = train_test_split(cancer.data,cancer.target,test_size=0.2,random_state=42)print("The first five samples {}".format(train_x[:5]))print("The first five targets {}".format(train_y[:5]))svm = LinearSVC(dual=False)svm.fit(train_x, train_y)pred_y = svm.predict(test_x)print("The first five prediction {}".format(pred_y[:5]))print("The real first five labels {}".format(test_y[:5]))cm = metrics.confusion_matrix(test_y, pred_y)print(cm)
-
From
line 9
toline 13
, the breast cancer dataset is loaded byload_breast_cancer
and split into two parts. -
An
SVM
object is created atline 21
withdual=False
. When the number of samples is larger than the number of features, we prefer dual=False. It is thenfit
atline 22
. -
In this example, we use
confusion_matrix
to evaluate the performance of our model atline 28
.
Kernel SVM
What should SVM do when the data is not linearly separable? Here we are going to introduce a very powerful feature in SVM, kernel function
. In simple terms, the kernel can convert data ...