...

KNN

Learn how to use the k-nearest neighbors (KNN) algorithm for classification and regression tasks.

We'll cover the following...

The k-nearest neighbors (KNN) algorithm makes predictions on new observations by looking at similar observations among data it has seen before. It looks at the values of the other features in the dataset for the rows with missing values, and it uses those values to estimate what the missing value should be. The k in KNN refers to the number of neighbors that the algorithm considers when making its estimation.

For example, suppose we have a dataset in which a particular row has a missing value for one of its features. In this case, we can use KNN to estimate the missing value based on the other available feature values. Specifically, if we set k (number of neighbors) to three, KNN would identify the three rows in the dataset that are most similar to the row in question based on their feature values and use the values of the missing feature from those three rows to impute the missing value for the original row.

The images above illustrate the step-by-step process used to classify observations based on their nearest neighbors:

A new observation appears, and we don’t know its class.
We identify the nearest neighbors.
We use their majority class to infer a class for our new observation.

The way KNN makes estimates depends on the type of task and hyperparameters. For regression tasks, KNN typically computes the mean or median of the k-nearest neighbors as the predicted value, while for classification tasks, KNN usually applies a majority vote among the k-nearest neighbors to determine the predicted class label. The choice of k and the distance metric used to measure similarity between instances can greatly affect the performance of KNN, so it should be chosen carefully.

KNN can be particularly useful for addressing nonlinearities that are hard to deal with using linear models. KNN also makes no assumptions about the distribution of the data, which can be useful when we don’t have enough information to validate assumptions.

It’s also particularly useful in circumstances where there are not many resources for training because, in practice, ...

Course Overview

Introduction to Machine Learning

Preprocessing

Supervised Learning

Unsupervised Learning

Model Evaluation

How to Predict the Traffic Volume Using Machine Learning

Tips and Tricks

Conclusion

Customer Segmentation with K-Means Clustering

KNN