...

/

Visualize the Working of K-Nearest Neighbors

Visualize the Working of K-Nearest Neighbors

Learn to visualize the working principle behind k-nearest neighbors.

Let’s move on and practically do what we have learned so far. As always, we need to import some basic libraries.

Press + to interact
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt;
import seaborn as sns
sns.set_style('whitegrid') # just optional!
sns.set(font_scale=1.5) # setting font size for the whole notebook
sns.set_style("whitegrid") # setting the style

Let's generate a dataset with two classes and see how the KNN algorithm works in reality for any new data points while assigning the class.

The dataset

We can use make_biclusters() from scikit-learn to create a simple dataset with two features (columns) and 50 observations (data points). We can also add Gaussian noise while creating clusters and assign them a class. Let's do this.

Press + to interact
## Generate 2 random clusters, create dataframe
from sklearn.datasets import make_biclusters # to generate data
X, classes, cols= make_biclusters(shape=(50,2), # features (n_row,n_cols)
n_clusters=2, # number of classes we want
noise=50,# The standard deviation of the gaussian noise.
random_state=101) # to re-generate same data everytime
# Creating dataframe
df = pd.DataFrame(X, columns=['feature_2','feature_1'])
df['target']= classes[0]
# Well, instead of True/False, lets replace with 1/0 targets -- a practice for map and lambda!
df['target'] = df['target'].map(lambda t: '1' if t==0 else '0')
print(df.tail(2)) # tail this time!

Let's check the class distribution.

Press + to interact
print(df.target.value_counts())

As seen from the code output above, we have the data with two features and a target column.

Visualize training and the test data

Let's create a scatterplot and visualize the distribution of data points. We can use the hue parameter for classes to show in ...

Access this course and 1400+ top-rated courses and projects.