K-means Clustering

In this lesson, we introduce a model that is used to group items based on some metrics.

Clustering is an unsupervised Machine Learning model that groups similar items in some groups based on some kind of metric. Clustering is a common way to explore your data when your data is unlabeled. Clustering itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them.

Clustering is a group of algorithms that focuses on grouping similar items. Among them, the most famous algorithm is k-means. In this lesson, we will focus on this algorithm, and do some extended discussion.

What is k-means

Clustering is a big topic. k-means is one of those simple cases trying to separate data points to k groups and minimize a metric known as the inertia or within-cluster sum-of-squares. One of the features of this algorithm is that you need to specify the number of clusters, which is k.

This algorithm groups a set of N samples into k disjoint clusters, C. Each group is described by the mean μj\mu_{j} of the data points in the cluster. The μj\mu_{j} ...

Access this course and 1400+ top-rated courses and projects.