Linear Separability
Understand the concept of linear separability and learn how to change the feature space to make data linearly separable.
We'll cover the following...
Linear separability
If the data isn’t linearly separable, then for every possible , at least one point exists in the dataset for which the constraint is not satisfied. In this case, the optimization problem is infeasible, as shown in the following example:
Suppose a medical researcher is working on a project to classify patient data as either “healthy” or “diseased.” They have collected data on various health parameters, such as age, blood pressure, and cholesterol levels, for a large number of patients. Their goal is to develop a model that can accurately classify new patients as either healthy or diseased based on these features. When we plot it on a graph with two features, healthy and diseased patients form clusters in different areas of the graph. No straight line can separate these two groups completely, indicating that the data isn’t linearly separable, as shown in the figure. Some of the healthy data points violate the constraints and make the optimization problem infeasible.
In cvxpy
, we can check the infeasibility in several ways. The following code tests the linear separation of a given dataset:
import numpy as npfrom sklearn import datasetsimport matplotlib.pyplot as pltimport cvxpy as cpX, Y = datasets.make_classification(n_samples=100, n_features=2,n_classes=2, n_clusters_per_class=2,n_informative=2, n_redundant=0,n_repeated=0,random_state=1)def phi(X):return np.hstack((X, np.ones((X.shape[0], 1))))phi_X = phi(X)Y[Y==0] = -1x1_samples = X[Y==1,:]x2_samples = X[Y==-1,:]fig = plt.figure()plt.scatter(x1_samples[:,0],x1_samples[:,1],c='red', marker='+')plt.scatter(x2_samples[:,0],x2_samples[:,1], c= 'green', marker='o')w = cp.Variable((phi_X.shape[1],1))objective = cp.Minimize(cp.norm(w,2))constraints = [cp.multiply(Y[:,np.newaxis], phi_X @ w) >= 1]prob = cp.Problem(objective, constraints)prob.solve()# check for infeasibilityif w.value is None:plt.title('Not linearly separable in the feature space defined by $\phi$')else:plt.title('Linearly separable in the feature space defined by $\phi$')x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),np.arange(y_min, y_max, 0.02))Y_hat = phi(np.c_[xx.ravel(), yy.ravel()]).dot(w.value).reshape(xx.shape)p2 = plt.contour(xx,yy,Y_hat, [0], colors='blue', linewidths = 1.5, linestyles='--')plt.xlabel('$x_1$'); plt.ylabel('$x_2$')# Support VectorsV = phi_X.dot(w.value)*Y[:,np.newaxis]-1idx = np.where(abs(np.squeeze(V))<1e-10)plt.scatter(X[idx,0],X[idx,1],facecolors='none', edgecolors='k',s=100)
Here is the explanation for the code above:
- Lines 29–32: The highlighted code checks for the feasibility of the optimization problem solved in the above code. If the optimal value of
w
isNone
, the optimization problem is infeasible; that is, the data points are not linearly separable in the feature space defined byphi
. It means we couldn’t find a weight vector to classify all the data points correctly, and we can’t draw a hyperplane in this feature space. Otherwise, if the optimal value ofw
is notNone
, the optimization problem is feasible, and the data points are linearly separable in the feature space defined byphi
.
Changing the feature space
If data isn’t linearly separable in a feature space defined by a mapping , we may try another feature space in which the data becomes linearly separable. For example, consider the following transformation of the features:
...