Similarity or dissimilarity measures are core components of clustering algorithms that cluster similar data points into the same clusters. In contrast, dissimilar or distant data points are placed into different clusters. Although the choice of a similarity/dissimilarity measure is task-dependent, it’s good to know the common ones.

Note: The measures involve two data points, say x\bold x and y\bold y, in Rd\R^d.

Minkowski distance

The Minkowski distance dminkd_{mink} between points x\bold x and y\bold y is defined as follows:

dmink(x,y,p)=(i=1dxiyip)1pd_{mink}(\bold x, \bold y, p)=\bigg(\sum_{i=1}^d|x_i - y_i|^p \bigg)^\frac{1}{p}

Here, pZ+p \in \Z^+, that is, pp is a positive integer. The code below implements Minkowski distance given two points x and y for a given value of p:

Get hands-on with 1200+ tech skills courses.