...

Clustering Metrics

Learn the main metrics used to evaluate clustering algorithms and explore how to implement them.

We'll cover the following...

Silhouette score
Calinski-Harabasz index
Davies-Bouldin index
Conclusion

Clustering metrics are used to evaluate the performance of clustering algorithms and assess how well they group similar data points. Because clustering tasks have no ground truth that we can use as a baseline, the choice of metric can be much more subjective.

Classification metrics are designed to assess the correctness of class assignments, making them less relevant when measuring the performance of clustering. In clustering, the focus is on the intrinsic structure of the data and the degree to which similar data points are grouped together, which is fundamentally different from the explicit class prediction and evaluation in classification tasks. That’s why we need specific metrics for this.

Let’s look at some of the most common clustering metrics and what they try to measure.

Silhouette score

The silhouette score measures how well each sample in a cluster is separated from samples in other clusters. It quantifies the compactness (how close cluster members are to each other) and separation of clusters (how far members of different clusters are from each other), with values ranging from $-1$ to $1$ . A higher score indicates better-defined clusters.

Course Overview

Introduction to Machine Learning

Preprocessing

Supervised Learning

Unsupervised Learning

Model Evaluation

How to Predict the Traffic Volume Using Machine Learning

Tips and Tricks

Conclusion

Customer Segmentation with K-Means Clustering

Clustering Metrics

Silhouette score