Home/Blog/Data Science/What is k-NN?

What is k-NN?

7 min read

Nov 16, 2023

content

Overview

Introduction to kkk-NN

How kkk-NN works

Example

kkk-NN algorithm in Python

Advantages and disadvantages of using the kkk-NN classification algorithm

Conclusion and next steps

Become a Software Engineer in Months, Not Years

From your first line of code, to your first day on the job — Educative has you covered. Join 2M+ developers learning in-demand programming skills.

Overview#

You will probably have heard all the buzz about machine learning and its applications. And if you have, you’ve probably heard about $k$ -nearest neighbors ( $k$ -NN). This algorithm is one of the simplest and easy-to-understand classification and regression algorithms and can be used for many practical applications. Some of these applications include:

Classification problems: The $k$ -NN algorithm can be used in pattern recognition and other classification problems, such as identification of spam emails and classification of documents.
Recommender systems: The $k$ -NN algorithm can be applied to find similar users or items and make recommendations.
Regression analysis: The $k$ -NN algorithm can also be used for regression tasks, such as predicting housing prices based on features such as area, location, and number of bedrooms.
Healthcare and medicine: The $k$ -NN algorithm can assist in identifying the likelihood of certain diseases based on patient data and historical records.

In this blog, our focus will be on only classification problems. We’ll take a look at a numerical example running the $k$ -NN algorithm to see how it works. We’ll also run the example via Python code. Finally, at the end of this blog, we’ll have a look at some advantages and disadvantages of using the $k$ -NN algorithm for classification purposes.

Now, let’s explore the basics of the $k$ -NN algorithm.

Introduction to kkk-NN#

What is $k$ -NN? As mentioned above, $k$ -NN is a widely recognized classification technique used to assign items to particular categories based on how similar they are to nearby data points. It falls under the category of instance-based or lazy learning algorithms. Unlike some other algorithms that build explicit models during training, $k$ -NN makes predictions by finding the most similar data points in the training dataset to the item being classified.

Python 3.10.4

import math
# Sample dataset
data = [(2, 3, 'A'), (3, 4, 'A'), (5, 6, 'B'), (7, 8, 'B'), (1, 2, 'A'), (6, 7, 'B'), (4, 5, 'A'), (8, 9, 'B'), (2, 2, 'A'), (9, 9, 'B')]
# Function to calculate Euclidean distance between two points
def euclidean_distance(point1, point2):
    distance = 0
    distance = math.sqrt((point1[0] - point2[0])**2 + (point1[1] - point2[1])**2)
    
    return distance
# k-NN algorithm
def k_nearest_neighbors(data, query_point, k):
    distances = []
    # Calculate distances from the query point to all data points
    for data_point in data:
        distance = euclidean_distance(query_point, data_point)
        distances.append((data_point, distance))
    # Sort distances in ascending order
    distances.sort(key=lambda x: x[1])
    
    # Get the k-nearest neighbors
    neighbors = [item[0] for item in distances[:k]]
    # Count the occurrences of each class among the neighbors
    class_counts = {}
    for neighbor in neighbors:
        label = neighbor[2]
        if label in class_counts:
            class_counts[label] += 1
        else:
            class_counts[label] = 1
    # Determine the majority class
    sorted_class_counts = sorted(class_counts.items(), key=lambda x: x[1], reverse=True)
    return sorted_class_counts[0][0]
# Test the k-NN algorithm
query = (6,5)
k = 3
result = k_nearest_neighbors(data, query, k)
print(f"The query point {query} belongs to class: {result}")

The code is explained below:

Line 1: We start by importing the math library, which is later used to calculate square roots while computing Euclidean distances between two points.

Lines 7–11: We define the euclidean_distance function between two points.

Line 14: We start defining the k_nearest_neighbors algorithm, which takes three arguments: data (the dataset), query_point (the point for which we want to find $k$ -nearest neighbors), and k (the number of neighbors).

Line 15: We initialize an empty list called distances to store distances between query_point and the data points.

Lines 18–20: We initialize a loop to iterate through each data point in the dataset. We then call the euclidean_distance function that calculates the distance between query_point and the current data_point and store it in the distance variable. Finally, we append a tuple containing the data point and its distance to the distances list.

Line 23: We then sort the distances list in ascending order based on the distances. This step identifies the $k$ -nearest neighbors.

Line 26: Next, we select the $k$ -nearest neighbors from the sorted distances list and store them in the neighbors list.

Lines 29–35: We start by initializing an empty dictionary called class_counts to count the occurrences of each class among the neighbors. Then, we initiate a loop to iterate through each neighbor in the neighbors list. We store the class label ( $A$ or $B$ ) for the current neighbor in the variable called label. We then have an if-else condition. We first check if the class label already exists. If it does, we increment its count by 1. Otherwise, we add it to our dictionary with a count of 1.

Lines 38–39: We now need to determine the majority class. For this, we sort the class_counts dictionary items in descending order, followed by returning the class label with the highest count (majority class).

Lines 42–45: Now, we need to test our $k$ -NN algorithm. For that, we define query_point to be (6, 5) and set the value of k to be 3. We then call the k_nearest_neighbors function and print the result.

Advantages and disadvantages of using the kkk-NN classification algorithm#

Let’s now take a look at a few advantages and disadvantages of using $k$ -NN for classification tasks.

Advantages

The algorithm is simple to understand and implement.
Unlike other classification algorithms, there’s no training involved. Learning is instance-based.
The algorithm can adapt to changing data, also known as lazy learning.
No assumptions are made about data distribution.
The models generated by $k$ -NN are interpretable. We can easily visualize decision boundaries.

Disadvantages

The algorithm can be computationally expensive with large datasets.
$k$ -NN is sensitive to the choice of $k$ .
The algorithm has limited ability to capture complex relationships.
The $k$ -NN algorithm might suffer from the curse of dimensionality. This curse refers to the phenomenon where the performance of algorithms such as $k$ -NN degrades as the number of features or dimensions in the dataset increases.
Scalability can be an issue with large datasets.

Conclusion and next steps#

A Practical Guide to Machine Learning with Python

A Practical Guide to Machine Learning with Python

This course teaches you how to code basic machine learning models. The content is designed for beginners with general knowledge of machine learning, including common algorithms such as linear regression, logistic regression, SVM, KNN, decision trees, and more. If you need a refresher, we have summarized key concepts from machine learning, and there are overviews of specific algorithms dispersed throughout the course.

72hrs 30mins

Beginner

108 Playgrounds

12 Quizzes

Machine Learning with Python Libraries

Machine learning is used for software applications that help them generate more accurate predictions. It is a type of artificial intelligence operating worldwide and offers high-paying careers. This path will provide a hands-on guide on multiple Python libraries that play an important role in machine learning. This path also teaches you about neural networks, PyTorch Tensor, PyCaret, and GAN. By the end of this module, you’ll have hands-on experience in using Python libraries to automate your applications.

53hrs

Beginner

56 Challenges

62 Quizzes

Mastering Machine Learning Theory and Practice

The machine learning field is rapidly advancing today due to the availability of large datasets and the ability to process big data efficiently. Moreover, several new techniques have produced groundbreaking results for standard machine learning problems. This course provides a detailed description of different machine learning algorithms and techniques, including regression, deep learning, reinforcement learning, Bayes nets, support vector machines (SVMs), and decision trees. The course also offers sufficient mathematical details for a deeper understanding of how different techniques work. An overview of the Python programming language and the fundamental theoretical aspects of ML, including probability theory and optimization, is also included. The course contains several practical coding exercises as well. By the end of the course, you will have a deep understanding of different machine-learning methods and the ability to choose the right method for different applications.

36hrs

Beginner

109 Playgrounds

10 Quizzes

Written By:

Kamran Lodhi

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

Data point	Class
(2, 3)	A
(3, 4)	A
(5, 6)	B
(7, 8)	B
(1, 2)	A
(6, 7)	B
(4, 5)	A
(8, 9)	B
(2, 2)	A
(9, 9)	B

Data point	Distance from (6, 5)
(2, 3)	4.47
(3, 4)	3.16
(5, 6)	1.41
(7, 8)	3.16
(1, 2)	5.83
(6, 7)	2
(4, 5)	2
(8, 9)	4.47
(2, 2)	5
(9, 9)	5

Data point	Class	Distance from (6, 5)	Rank
(2, 3)	A	4.47
(3, 4)	A	3.16
(5, 6)	B	1.41	1
(7, 8)	B	3.16
(1, 2)	A	5.83
(6, 7)	B	2	2
(4, 5)	A	2	3
(8, 9)	B	4.47
(2, 2)	A	5
(9, 9)	B	5

What is k-NN?

Overview#

Introduction to kkk-NN#

How kkk-NN works#

Example#

kkk-NN algorithm in Python#

Advantages and disadvantages of using the kkk-NN classification algorithm#

Conclusion and next steps#

#