Machine Learning Algorithms II

We'll cover the following...

- - 6. K-Nearest Neighbors (KNN)
- - 7. K-Means
- - 8. Random Forest
- - 9. Dimensionality Reduction
- - 10. Artificial Neural Networks (ANN)
- - Final Thoughts

6. K-Nearest Neighbors (KNN)

KNN algorithm is a very simple and popular technique. It is based on the following idea from real life: You are the average of the five people you most associate with!

KNN classifies an object by searching through the entire training set for the k most similar instances, the k neighbors, and assigning a common output variable to all those k instances. The figure below represents a classification example. The test sample (green dot) should be classified either to blue squares or to red triangles. If k = 3 (solid line circle) it is assigned to the red triangles because there are 2 triangles and only 1 square inside the inner circle. If k = 5 (dashed line circle) it is assigned to the blue squares (3 squares vs. 2 triangles inside the outer circle):

The selection of k is critical here; a small value can result in a lot of noise and inaccurate results, while a large value is not feasible and defeats the purpose of the algorithm.

Although mostly used for classification, this technique can also be used for regression problems. For example, when dealing with a regression task, the output variable can be the mean of the k instances, while for classification problems this is often the mode class value.

The distance functions for assessing similarity between instances can be Euclidean, Manhattan, or Minkowski distance. Euclidean distance, the most commonly used one, is simply an ordinary straight-line distance between two points. To be specific, it is the square root of the sum of the squares of ...

Python Fundamentals for Data Science

The Fundamentals of Statistics

Machine Learning 101

End-to-End Machine Learning Project

The Real Talk

Machine Learning Algorithms II

6. K-Nearest Neighbors (KNN)