Hierarchical Clustering

Let’s look at another famous clustering algorithm called hierarchical clustering, which involves grouping instances into a hierarchy or tree of clusters.

Hierarchical clustering

Hierarchical clustering involves creating a hierarchy of clusters. Representing data objects in the form of a hierarchy is useful for data summarization and visualization. For example, consider we want to organize the people in an organization into major groups such as executives, managers, and staff. We can further partition these above into smaller sub-groups. This is the basic idea of hierarchical clustering.

Agglomerative hierarchical clustering

Agglomerative hierarchical clustering uses a bottom-up strategy. In this strategy, each instance is initially treated as a cluster. At each successive iteration of this clustering algorithm, a cluster is merged with other most similar clusters until only one large cluster is formed.

Basic steps

Agglomerative hierarchical clustering works as follows.

  1. We compute the similarities between the instances and represent these in the form of a matrix, which is also called similarity or proximity matrix.

  2. Now we treat each instance as a single cluster.

  3. We merge the closest two clusters and update the proximity matrix. Updating the matrix involves computing the cluster distance between the new merged cluster and every other cluster. This is so that in the next iteration, this new updated matrix is used, and we perform the merge operation using the updated matrix again.

  4. We keep on repeating step 3 until a condition is met or one cluster is formed. The condition might be the minimum clusters remaining once the merge operation has been performed.

An example

g A A ABCDEF ABCDEF A->ABCDEF B B BC BC B->BC BCDEF BCDEF BC->BCDEF C C C->BC D D DE DE D->DE DEF DEF DE->DEF E E E->DE F F F->DEF DEF->BCDEF BCDEF->ABCDEF
Agglomerative clustering
  1. In the above diagram we treat each point (A,B,C,D,E,F)(A, B, C, D, E, F) ...