How to calculate the cosine similarity

Cosine similarity is a measure used to determine how similar two vectors are, often in the context of text mining and recommendation systems. Its formula is given as:

whereA\Vert{A}\VertandB\Vert{B}\Vertrepresent the magnitudes of arbitrary vectorsA\Vert{A}\VertandB.\Vert{B}\Vert.

To further understand this formula, let's solve a simple example.

Example

Suppose we have two vectors,AAandBBdefined as:

Let's start off by computing the dot product of these vectors.

Step 1: Compute the dot product of vectors

The dot product of any two vectors can be calculated by multiplying each vectors’ element and summing them, which can mathematically be written as:

Here, the dot product can be calculated as:

Next, we will calculate the magnitudes of the vectorsAAandB.B.

Step 2: Compute vector magnitudes

Now, we will evaluate the magnitude of each vector. The magnitude of an arbitrary vectorAAis calculated as the square root of the sum of the squares of its elements, mathematically represented as:

Here, the magnitudes of vectorsAAandBBcan be written as:

Step 3: Compute final answer

Finally, we can calculate the cosine similarity between the vectorsAAandB,B,which turns out to be:

Note: The value of the cosine similarity varies from 1-1 to 11 where:

  • 11 implies that the vectors are in the same direction.

  • 00 means that the vectors are orthogonal (no similarity).

  • 1-1 implies that the vectors are in opposite directions.

Code

The example computed above can be implemented in Python using the numpy library as shown below:

import numpy as np
# Define vectors
A = np.array([1, 2, 3])
B = np.array([4, 5, 6])
# Compute the dot product
dot_product = np.dot(A, B)
# Compute magnitudes
mag_A = np.linalg.norm(A)
mag_B = np.linalg.norm(B)
# Calculate the cosine similarity
cosine_similarity = dot_product / (mag_A * mag_B)
print(cosine_similarity)

Code explanation

In the code above:

  • Line 1: Importing the numpy package for performing the relevant mathematical operations.

  • Lines 4–5: Initializing vectors A and B.

  • Line 8: Computing the dot product between the vectors A and B.

  • Lines 11–12: Computing the magnitudes of the vectors A and B.

  • Lines 15–16: Calculating the cosine similarity by taking the magnitudes and dot product of the vectors A and B calculated above and printing the result.

Conclusion

In conclusion, evaluating this metric enables comparisons and clustering of data points in a multidimensional space. Its intuitive interpretation, ranging from 1-1 (opposite directions) to 11 (same direction), makes it a valuable method for measuring the likeness or similarity between various items or documents.

Copyright ©2024 Educative, Inc. All rights reserved