K-Means Clustering

Learn about the k-means algorithm, its initialization, NP-hardness, and variance computation with examples.

Traditionally, in machine learning, we start with the popular partitional clustering algorithm called kk-means clustering. This algorithm divides the data into kk clusters based on a similarity score. The objective is to minimize the total variance of the kk clusters. The number of clusters, kk, must be specified.

Note: The choice of similarity score is a hyperparameter.

Objective

Given a set D={x1,x2,,xn}D=\{\bold x_1, \bold x_2, \dots, \bold x_n \} of nn data points in Rd\R^d, the goal is to partition D ...