Similarity and Dissimilarity Measures
We'll cover the following...
Similarity or dissimilarity measures are core components of clustering algorithms that cluster similar data points into the same clusters. In contrast, dissimilar or distant data points are placed into different clusters. Although the choice of a similarity/dissimilarity measure is task-dependent, it’s good to know the common ones.
Note: The measures involve two data points, say and , in .
Minkowski distance
The Minkowski distance between points and is defined as follows:
Here, , that is, is a positive integer. The code below implements Minkowski distance given two points x
and y
for a given value of p
:
Press + to interact
import numpy as npdef Minkowski_distance(x, y, p=2):return np.sum(np.abs(x-y)**p)**(1./p)d, p = 20, 3x, y = np.random.rand(d), np.random.rand(d)print(f'The Minkowski distance between x and y is {Minkowski_distance(x, y, p=2)}')
The p-norm
The p-norm of a vector , denoted by ...