Hands-on Machine Learning with Scikit-Learn/

...

K-means Clustering

In this lesson, we introduce a model that is used to group items based on some metrics.

We'll cover the following...

- What is k-means
- Cluster on generated data
- The impact of k on performance
- What is Spectral Clustering?

Clustering is an unsupervised Machine Learning model that groups similar items in some groups based on some kind of metric. Clustering is a common way to explore your data when your data is unlabeled. Clustering itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them.

Clustering is a group of algorithms that focuses on grouping similar items. Among them, the most famous algorithm is k-means. In this lesson, we will focus on this algorithm, and do some extended discussion.

What is `k-means`

Clustering is a big topic. k-means is one of those simple cases trying to separate data points to k groups and minimize a metric known as the inertia or within-cluster sum-of-squares. One of the features of this algorithm is that you need to specify the number of clusters, which is k.

This algorithm groups a set of N samples into k disjoint clusters, C. Each group is described by the mean $\mu_{j}$ of the data points in the cluster. The $\mu_{j}$ is referred to as the cluster centroids.

The goal of this algorithm is to minimize the objective function (inertia or within-cluster sum-of-squares) like below:

\sum_{i=0}^{n} \min _{\mu_{j} \in C}\left(\left\|x_{i}-\mu_{j}\right\|^{2}\right)

Preliminaries

Working with Datasets

Feature Engineering

General Concepts

Linear Regression

Logistic Regression

Support Vector Machine

Tree Model and Ensemble Method

Unsupervised Learning

Deep Learning

Others

What's Next

K-means Clustering

What is `k-means`

K-means Clustering

What is k-means

What is `k-means`