Edit- and Substring-Based Similarity
Get an overview of edit- and substring-based similarity functions for texts.
We'll cover the following
As humans, we understand that “Robert Schwarz,” “Rob Shwarts,” “Bob Shvarts,” and “Schwaz, Robert” are suspiciously similar. Can we also compute scores programmatically that represent our human perception?
Let’s explore several similarity functions for texts based on edit distances or common substrings. A third class of text similarities based on vectorization is out of scope here. We use the following toy dataset here:
Get hands-on with 1400+ tech skills courses.