...

/

Agglomerative Clustering Walk-Through Example

Agglomerative Clustering Walk-Through Example

Practice the agglomerative clustering algorithm with a step-by-step walk-through example in this lesson.

In this example, we use the MIN method for computing cluster dissimilarity. Also, the distance measure we use is Euclidean distance.

Dry running an example

We’ll use the same dataset as before for consistency and clarity. It’s as follows:

Press + to interact
Actual dataset
Actual dataset

Creation of NN clusters

The first step is to create NN clusters. This is easy enough. This will change the graph to look like this:

Press + to interact
Actual dataset with N clusters
Actual dataset with N clusters

In the code widget below, we’ll use the matplotlib library to plot all data points as distinct clusters:

Press + to interact
# Importing required packages
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Generating synthetic dataset
x = np.array([1, 2, 2, 2.5, 3, 4, 4, 5, 5, 5.5, 6, 6, 6, 6.5, 7])
y = np.array([2, 1, 1.5, 3.5, 4, 3.5, 7.5, 6, 7, 2, 1.5, 3, 5.5, 5, 2.5])
data_points = [[x[i], y[i]] for i in range(len(x))]
# Creating a DataFrame to assign unique cluster IDs to each data point
df = pd.DataFrame({'Data points': data_points,
'Cluster ID' : np.arange(len(data_points))})
colors = ['silver', 'tomato', 'skyblue', 'yellow', 'olive',
'yellowgreen', 'green', 'red', 'cyan',
'lightblue', 'plum', 'purple', 'hotpink', 'pink', 'blue']
# Plotting data points
plt.scatter(x, y, c=colors, alpha=0.6, s=150)
print(df)

Computing the distance matrix

As we need to merge the two clusters with the MIN dissimilarity, it’s handy to compute the distance matrix for all pairs of points in advance.

Once ...