K-Means Clustering
Learn about the k-means algorithm, its initialization, NP-hardness, and variance computation with examples.
Traditionally, in machine learning, we start with the popular partitional clustering algorithm called -means clustering. This algorithm divides the data into clusters based on a similarity score. The objective is to minimize the total variance of the clusters. The number of clusters, , must be specified.
Note: The choice of similarity score is a hyperparameter.
Objective
Given a set of data points in , the goal is to partition into the given partitions, say, such that is minimum. The using Euclidean distance can be defined as follows:
Here, is the
Note: The variance of a partition is defined in terms of Euclidean distance here; however, other distance/similarity measures can also be used.
Variance computation code example
Let’s compute the variance of a set of points using numpy
:
Get hands-on with 1400+ tech skills courses.