Cosine Similarity

Implement normalized cosine similarity to evaluate the embedding model.

Chapter Goals:

  • Learn about cosine similarity and how it's used to compare embedding vectors
  • Create a function that computes cosine similarities for a given word

A. Vector comparison

In mathematics, the standard way for comparing vector similarity is through cosine similarity. Since word embeddings are just vectors of real numbers, we can use also cosine similarity to compare embeddings for different words.

For two vectors, u and v, the equation for cosine similarity is

cos sim=uu2vv2\text{cos sim} = \frac{\mathbf{u}}{||\mathbf{u}||_2} \cdot \frac{\mathbf{v}}{||\mathbf{v}||_2}

where v2\small ||v||_2 represents the L2-norm of vector vv, and \cdot represents the dot product operation.

We refer to the quantity vv2\frac{v}{||v||_2} as the L2-normalization of vector v\small v.

B. Correlation

The cosine similarity measures the correlation between two vectors, i.e. how closely related the two vectors are. The range of values for cosine similarity is [-1, 1]. A value of 1 means the vectors are perfectly identical, a ...