...

/

Visualize the Working of K-Nearest Neighbors

Visualize the Working of K-Nearest Neighbors

Learn to visualize the working principle behind k-nearest neighbors.

Let’s move on and practically do what we have learned so far. As always, we need to import some basic libraries.

Press + to interact
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt;
import seaborn as sns
sns.set_style('whitegrid') # just optional!
sns.set(font_scale=1.5) # setting font size for the whole notebook
sns.set_style("whitegrid") # setting the style

Let's generate a dataset with two classes and see how the KNN algorithm works in reality for any new data points while assigning the class.

The dataset

We can use make_biclusters() from scikit-learn to create a simple dataset with two features (columns) and 50 observations (data points). We can also add Gaussian noise while creating clusters and assign them a class. Let's do this.

Press + to interact
## Generate 2 random clusters, create dataframe
from sklearn.datasets import make_biclusters # to generate data
X, classes, cols= make_biclusters(shape=(50,2), # features (n_row,n_cols)
n_clusters=2, # number of classes we want
noise=50,# The standard deviation of the gaussian noise.
random_state=101) # to re-generate same data everytime
# Creating dataframe
df = pd.DataFrame(X, columns=['feature_2','feature_1'])
df['target']= classes[0]
# Well, instead of True/False, lets replace with 1/0 targets -- a practice for map and lambda!
df['target'] = df['target'].map(lambda t: '1' if t==0 else '0')
print(df.tail(2)) # tail this time!

Let's check the class distribution.

Press + to interact
print(df.target.value_counts())

As seen from the code output above, we have the data with two features and a target column.

Visualize training and the test data

Let's create a scatterplot and visualize the distribution of data points. We can use the hue parameter for classes to show in different colors. In another plot (right side), we can add a test point for which the class is unknown, and we want KNN to predict its class.

Press + to interact
# Figure 1 (left)
fig,(ax1,ax2)=plt.subplots(nrows=1,ncols=2,figsize=(16,8))
sns.scatterplot(x='feature_1',y='feature_2',data=df,hue='target',ax=ax1,s=150)
ax1.set_title("The data -- two classes")
ax1.set_xlabel('Feature 1')
ax1.set_ylabel('Feature 2')
ax1.legend().set_title('Target')
# Plot our new point
test_point=[[10,50]]
# Figure 2 (right)
sns.scatterplot(x='feature_1',y='feature_2',data=df,hue='target',ax=ax2,s=150)
ax2.scatter(x=test_point[0][0],y=test_point[0][1],color="red",marker="*",s=1000)
ax2.set_title('Red star is a test (unknown) point')
ax2.set_xlabel('Feature 1')
ax2.set_ylabel('Feature 2')
ax2.legend().set_title('Target')

The red star is a new unknown data point that we want our KNN algorithm to predict, and for this purpose, we need to perform the following ...