Distance measures are an essential part of machine learning algorithms. The machine learning algorithms are primarily divided into classification and regression algorithms.
Regression algorithms learn from the training data by assigning weights to different characteristics of the dataset. These algorithms can then predict the labels for test data.
Classification algorithms, on the other hand, are used for differentiating between different objects. These types of algorithms use the training data to group the test data. They do this by finding the distance between the training set data and the test set data. A popular example of a classification algorithm is the KNN algorithm.
KNN algorithms find the distance between the training set data and test data and then use these distances to give the test point a label. The closer the test point is to a certain training point or a set of test points, the greater the probability of having the same label as them.
The red data point in the figure above is the test data point and the blue data points are the training points. The distance measures are used to calculate the distances between the training points and the test points.
Manhattan distance is a distance measure that is calculated by taking the sum of distances between the x and y coordinates.
The Manhattan distance is also known as Manhattan length. In other words, it is the distance between two points measured along axes at right angles.
Manhattan distance works very well for high-dimensional datasets. As it does not take any squares, it does not amplify the differences between any of the features. It also does not ignore any features.
def manhattan(train,test):dist=[]train=train.to_numpy()for ind, r in test.iterrows():r=r.to_numpy()distance = np.abs(train - r).sum(-1)idx = np.argpartition(distance, 10)dist.append(idx[:10])return dist