Business Machine Learning/

...

Visualize the Working of K-Nearest Neighbors

Learn to visualize the working principle behind k-nearest neighbors.

We'll cover the following...

The dataset
Visualize training and the test data
Compute distances from the test point
Plot distances and selected k value

Press + to interact

Python 3.8

## Generate 2 random clusters, create dataframe
from sklearn.datasets import make_biclusters # to generate data
X, classes, cols= make_biclusters(shape=(50,2), # features (n_row,n_cols)
                                   n_clusters=2, # number of classes we want
                                   noise=50,# The standard deviation of the gaussian noise.
                                   random_state=101) # to re-generate same data everytime
# Creating dataframe
df = pd.DataFrame(X, columns=['feature_2','feature_1'])
df['target']= classes[0]
# Well, instead of True/False, lets replace with 1/0 targets -- a practice for map and lambda!
df['target'] = df['target'].map(lambda t: '1' if t==0 else '0')
print(df.tail(2)) # tail this time!

Press + to interact

Python 3.8

# Figure 1 (left)
fig,(ax1,ax2)=plt.subplots(nrows=1,ncols=2,figsize=(16,8))
sns.scatterplot(x='feature_1',y='feature_2',data=df,hue='target',ax=ax1,s=150)
ax1.set_title("The data -- two classes")
ax1.set_xlabel('Feature 1')
ax1.set_ylabel('Feature 2')
ax1.legend().set_title('Target')
# Plot our new point
test_point=[[10,50]]
# Figure 2 (right)
sns.scatterplot(x='feature_1',y='feature_2',data=df,hue='target',ax=ax2,s=150)
ax2.scatter(x=test_point[0][0],y=test_point[0][1],color="red",marker="*",s=1000)
ax2.set_title('Red star is a test (unknown) point')
ax2.set_xlabel('Feature 1')
ax2.set_ylabel('Feature 2')
ax2.legend().set_title('Target')

Course Introduction

Linear Regression

Regularization

Bias-Variance Trade-off

Categorical Features

Logistic Regression

Logistic Regression: Titanic Data

Sentiment Analysis Using Multinomial Logistic Regression

Multiclass Classification and Handling Imbalanced Classes

Project: Predicting Chronic Kidney Disease

K-Nearest Neighbors

Implementation of K-Nearest Neighbors

Logistic Regression vs. KNN

Decision Tree Learning

Implement the Decision Tree Classifier from Scratch

Bootstrapping and Confidence Interval

Support Vector Machine

Practice and Comparisons

What's Next?

Appendix

Visualize the Working of K-Nearest Neighbors

The dataset

Visualize training and the test data