...

/

Hierarchical Clustering

Hierarchical Clustering

Learn all about hierarchical clustering and how to cluster data with it using scikit-learn.

Hierarchical clustering is another popular unsupervised clustering algorithm that groups data points into clusters based on similarity. It works by building a hierarchy of clusters, starting with individual data points and gradually merging them into larger clusters.

There are two types of hierarchical clustering: agglomerative and divisive.

Agglomerative clustering

Agglomerative clustering is a hierarchical clustering algorithm that groups data points based on their pairwise distances or similarities. Unlike k-means or DBSCAN, agglomerative clustering doesn’t require specifying the number of clusters in advance. Instead, it builds a hierarchy of clusters by iteratively merging the most similar or nearby data points or clusters.

The algorithm starts by considering each data point as a separate cluster. It then repeatedly merges the two closest clusters based on a chosen linkage criterion, which determines the distance or similarity between clusters. The most commonly used linkage criteria are as follows:

  • Ward: This minimizes the variance of the distances between the clusters being merged.

  • Complete: This maximizes the distance between the closest points of the clusters being merged.

  • Average: This uses the average distance between all pairs of points in the two clusters being merged.

The choice of linkage criterion can have a significant impact on the clustering results, ...

Access this course and 1400+ top-rated courses and projects.