Fundamentals of Machine Learning: A Pythonic Introduction/

...

Similarity and Dissimilarity Measures

We'll cover the following...

Minkowski distance
- The p-norm
Euclidean distance
Manhattan distance
Mahalanobis distance
Geodesic distance

Similarity or dissimilarity measures are core components of clustering algorithms that cluster similar data points into the same clusters. In contrast, dissimilar or distant data points are placed into different clusters. Although the choice of a similarity/dissimilarity measure is task-dependent, it’s good to know the common ones.

Note: The measures involve two data points, say $\bold x$ and $\bold y$ , in $\R^d$ .

Minkowski distance

The Minkowski distance $d_{mink}$ between points $\bold x$ and $\bold y$ is defined as follows:

d_{mink}(\bold x, \bold y, p)=\bigg(\sum_{i=1}^d|x_i - y_i|^p \bigg)^\frac{1}{p}

Here, $p \in \Z^+$ , that is, $p$ is a positive integer. The code below implements Minkowski distance given two points x and y for a given value of p:

Press + to interact

Course Overview

Supervised Learning

Detect Cyber Intrusion Using Machine Learning

Clustering

Project: Bag of Visual Words

Generalized Linear Regression

Face Recognition Using Kernel Linear Discriminant

Support Vector Machine

Logistic Regression

Ensemble Learning

Early Stage Diabetes Prediction Using Ensemble Learning

Decoding Dimensions: PCA and Autoencoders

Image Reconstruction Using PCA

Image Colorization using Autoencoders

Colorful Face Generation with VAEs

Appendix

Wrapping Up

How to Predict the Traffic Volume Using Machine Learning

Similarity and Dissimilarity Measures

Minkowski distance

The p-norm