Advanced Semantic Similarity Methods
Let's discover advanced semantic similarity methods for word, phrase, and sentence similarity.
In this section, we'll discover advanced semantic similarity methods for word, phrase, and sentence similarity. We've already learned how to calculate semantic similarity with spaCy's similarity method and obtained some scores. But what do these scores mean? How are they calculated? Before we look at more advanced methods, first, we'll learn how semantic similarity is calculated.
Understanding semantic similarity
When we collect text data (any sort of data), we want to see how some examples are similar, different, or related. We want to measure how similar two pieces of text are by calculating their similarity scores. Here, the term semantic similarity comes into the picture; semantic similarity is a metric that's defined over texts, where the distance between two texts is based on their semantics.
A metric in mathematics is basically a distance function. Every metric induces a topology on the vector space. Word vectors are vectors, so we want to calculate the distance between them and use this as a similarity score.
Now, we'll learn about two commonly used distance functions: Euclidian distance and cosine distance. Let's start with Euclidian distance.
Euclidian distance
The Euclidian distance between two points in a k-dimensional space is the length of the path between them. The distance between two points is calculated by the Pythagorean theorem. We calculate this distance by summing the difference of each coordinate's square and then taking the square root of this sum. The following diagram shows the Euclidian distance between two vectors, dog and cat:
Get hands-on with 1400+ tech skills courses.