Cosine similarity is a measure used to determine how similar two vectors are, often in the context of text mining and recommendation systems. Its formula is given as:
where
To further understand this formula, let's solve a simple example.
Suppose we have two vectors,
Let's start off by computing the dot product of these vectors.
The dot product of any two vectors can be calculated by multiplying each vectors’ element and summing them, which can mathematically be written as:
Here, the dot product can be calculated as:
Next, we will calculate the magnitudes of the vectors
Now, we will evaluate the magnitude of each vector. The magnitude of an arbitrary vector
Here, the magnitudes of vectors
Finally, we can calculate the cosine similarity between the vectors
Note: The value of the cosine similarity varies from
to where:
implies that the vectors are in the same direction.
means that the vectors are orthogonal (no similarity).
implies that the vectors are in opposite directions.
The example computed above can be implemented in Python using the numpy
library as shown below:
import numpy as np# Define vectorsA = np.array([1, 2, 3])B = np.array([4, 5, 6])# Compute the dot productdot_product = np.dot(A, B)# Compute magnitudesmag_A = np.linalg.norm(A)mag_B = np.linalg.norm(B)# Calculate the cosine similaritycosine_similarity = dot_product / (mag_A * mag_B)print(cosine_similarity)
In the code above:
Line 1: Importing the numpy
package for performing the relevant mathematical operations.
Lines 4–5: Initializing vectors A and B.
Line 8: Computing the dot product between the vectors A and B.
Lines 11–12: Computing the magnitudes of the vectors A and B.
Lines 15–16: Calculating the cosine similarity by taking the magnitudes and dot product of the vectors A and B calculated above and printing the result.
In conclusion, evaluating this metric enables comparisons and clustering of data points in a multidimensional space. Its intuitive interpretation, ranging from