Working Principle of K-Nearest Neighbors

Learn about the working principle behind k-nearest neighbors.

We'll cover the following...

We have explored the evaluation methods for linear regression (continuous target) and logistic regression (class prediction). In a classification problem, we're generally concerned about identifying anything incorrectly. We want our algorithm to predict correctly as much as possible on the test dataset for generalization. KNN is another straightforward and widely used algorithm, typically for classification. However, it can be used for regression tasks as well.

KNN review and distance functions

As discussed in the previous lesson, KNN considers how many observations belong to a particular class within the selected $k$ (number of neighbors) value and decides on more votes for a test data class. The algorithm stores all available data points and computes their distances from the test data for its classification task. It's also important to remember that the physical units of the features are critical in this case, and all have to be on a similar scale. Feature scaling (standardization or normalization) can drastically improve the model performance.

Distance functions for KNN

Typically, the KNN algorithm uses Euclidean or Manhattan distance functions. Other distance metrics are rarely used for computing the distances. We can also create our distance function in the algorithm (depending on our customers’ needs). A few standard distance functions for KNN are as follows:

Euclidean distance

Euclidean is a straight line distance between two points. This is the most common choice and can be calculated using the Pythagorean theorem from the cartesian coordinates between two points. Sometimes, the Euclidean distance is also called the Pythagorean distance.

Course Introduction

Linear Regression

Regularization

Bias-Variance Trade-off

Categorical Features

Logistic Regression

Logistic Regression: Titanic Data

Sentiment Analysis Using Multinomial Logistic Regression

Multiclass Classification and Handling Imbalanced Classes

Project: Predicting Chronic Kidney Disease

K-Nearest Neighbors

Implementation of K-Nearest Neighbors

Logistic Regression vs. KNN

Decision Tree Learning

Implement the Decision Tree Classifier from Scratch

Bootstrapping and Confidence Interval

Support Vector Machine

Practice and Comparisons

What's Next?

Appendix

Working Principle of K-Nearest Neighbors

KNN review and distance functions

Distance functions for KNN

Euclidean distance