Linear Separability

Understand the concept of linear separability and learn how to change the feature space to make data linearly separable.

Linear separability

If the data isn’t linearly separable, then for every possible w\bold w, at least one point (x,y)(\bold x, y) exists in the dataset for which the constraint yϕ(x)Tw1y\phi(\bold x)^T\bold w\ge1 is not satisfied. In this case, the optimization problem is infeasible, as shown in the following example:

Press + to interact
Linearly non-separable
Linearly non-separable

Suppose a medical researcher is working on a project to classify patient data as either “healthy” or “diseased.” They have collected data on various health parameters, such as age, blood pressure, and cholesterol levels, for a large number of patients. Their goal is to develop a model that can accurately classify new patients as either healthy or diseased based on these features. When we plot it on a graph with two features, healthy and diseased patients form clusters in different areas of the graph. No straight line can separate these two groups completely, indicating that the data isn’t linearly separable, as shown in the figure. Some of the healthy data points violate the constraints and make the optimization problem infeasible.

In cvxpy, we can check the infeasibility in several ways. The following code tests the linear separation of a given dataset:

Press + to interact
import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
import cvxpy as cp
X, Y = datasets.make_classification(n_samples=100, n_features=2,
n_classes=2, n_clusters_per_class=2,
n_informative=2, n_redundant=0,n_repeated=0,random_state=1)
def phi(X):
return np.hstack((X, np.ones((X.shape[0], 1))))
phi_X = phi(X)
Y[Y==0] = -1
x1_samples = X[Y==1,:]
x2_samples = X[Y==-1,:]
fig = plt.figure()
plt.scatter(x1_samples[:,0],x1_samples[:,1],c='red', marker='+')
plt.scatter(x2_samples[:,0],x2_samples[:,1], c= 'green', marker='o')
w = cp.Variable((phi_X.shape[1],1))
objective = cp.Minimize(cp.norm(w,2))
constraints = [cp.multiply(Y[:,np.newaxis], phi_X @ w) >= 1]
prob = cp.Problem(objective, constraints)
prob.solve()
# check for infeasibility
if w.value is None:
plt.title('Not linearly separable in the feature space defined by $\phi$')
else:
plt.title('Linearly separable in the feature space defined by $\phi$')
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
np.arange(y_min, y_max, 0.02))
Y_hat = phi(np.c_[xx.ravel(), yy.ravel()]).dot(w.value).reshape(xx.shape)
p2 = plt.contour(xx,yy,Y_hat, [0], colors='blue', linewidths = 1.5, linestyles='--')
plt.xlabel('$x_1$'); plt.ylabel('$x_2$')
# Support Vectors
V = phi_X.dot(w.value)*Y[:,np.newaxis]-1
idx = np.where(abs(np.squeeze(V))<1e-10)
plt.scatter(X[idx,0],X[idx,1],facecolors='none', edgecolors='k',s=100)

Here is the explanation for the code above:

  • Lines 29–32: The highlighted code checks for the feasibility of the optimization problem solved in the above code. If the optimal value of w is None, the optimization problem is infeasible; that is, the data points are not linearly separable in the feature space defined by phi. It means we couldn’t find a weight vector to classify all the data points correctly, and we can’t draw a hyperplane in this feature space. Otherwise, if the optimal value of w is not None, the optimization problem is feasible, and the data points are linearly separable in the feature space defined by phi.

Changing the feature space

If data isn’t linearly separable in a feature space defined by a mapping ϕ\phi, we may try another feature space in which the data becomes linearly separable. For example, consider the following transformation of the features:

ϕ([x1x2 ...