Visualize the Working of K-Nearest Neighbors
Learn to visualize the working principle behind k-nearest neighbors.
Let’s move on and practically do what we have learned so far. As always, we need to import some basic libraries.
import pandas as pdimport numpy as npimport matplotlib.pyplot as plt;import seaborn as snssns.set_style('whitegrid') # just optional!sns.set(font_scale=1.5) # setting font size for the whole notebooksns.set_style("whitegrid") # setting the style
Let's generate a dataset with two classes and see how the KNN algorithm works in reality for any new data points while assigning the class.
The dataset
We can use make_biclusters()
from scikit-learn to create a simple dataset with two features (columns) and 50 observations (data points). We can also add Gaussian noise while creating clusters and assign them a class. Let's do this.
## Generate 2 random clusters, create dataframefrom sklearn.datasets import make_biclusters # to generate dataX, classes, cols= make_biclusters(shape=(50,2), # features (n_row,n_cols)n_clusters=2, # number of classes we wantnoise=50,# The standard deviation of the gaussian noise.random_state=101) # to re-generate same data everytime# Creating dataframedf = pd.DataFrame(X, columns=['feature_2','feature_1'])df['target']= classes[0]# Well, instead of True/False, lets replace with 1/0 targets -- a practice for map and lambda!df['target'] = df['target'].map(lambda t: '1' if t==0 else '0')print(df.tail(2)) # tail this time!
Let's check the class distribution.
print(df.target.value_counts())
As seen from the code output above, we have the data with two features and a target
column.
Visualize training and the test data
Let's create a scatterplot and visualize the distribution of data points. We can use the hue
parameter for classes to show in ...