...
/Agglomerative Clustering Walk-Through Example
Agglomerative Clustering Walk-Through Example
Practice the agglomerative clustering algorithm with a step-by-step walk-through example in this lesson.
In this example, we use the MIN method for computing cluster dissimilarity. Also, the distance measure we use is Euclidean distance.
Dry running an example
We’ll use the same dataset as before for consistency and clarity. It’s as follows:
Press + to interact
Creation of clusters
The first step is to create clusters. This is easy enough. This will change the graph to look like this:
Press + to interact
In the code widget below, we’ll use the matplotlib
library to plot all data points as distinct clusters:
Press + to interact
# Importing required packagesimport pandas as pdimport matplotlib.pyplot as pltimport numpy as np# Generating synthetic datasetx = np.array([1, 2, 2, 2.5, 3, 4, 4, 5, 5, 5.5, 6, 6, 6, 6.5, 7])y = np.array([2, 1, 1.5, 3.5, 4, 3.5, 7.5, 6, 7, 2, 1.5, 3, 5.5, 5, 2.5])data_points = [[x[i], y[i]] for i in range(len(x))]# Creating a DataFrame to assign unique cluster IDs to each data pointdf = pd.DataFrame({'Data points': data_points,'Cluster ID' : np.arange(len(data_points))})colors = ['silver', 'tomato', 'skyblue', 'yellow', 'olive','yellowgreen', 'green', 'red', 'cyan','lightblue', 'plum', 'purple', 'hotpink', 'pink', 'blue']# Plotting data pointsplt.scatter(x, y, c=colors, alpha=0.6, s=150)print(df)
Computing the distance matrix
As we need to merge the two clusters with the MIN dissimilarity, it’s handy to compute the distance matrix for all pairs of points in advance.
Once ...