Removing and Replacing Tokens
Learn to remove stopwords and normalize word variations, such as synonyms, to improve the matching quality.
We'll cover the following
A text consists of one or more words and other tokens. Some of those are more informative than others. Words can vary in spelling, grammar, language, and more. Let’s discuss which types of words should be removed and which should be replaced to improve the matching quality.
Remove tokens aka stopwords
Stopwords are text tokens that are not informative. They can do more harm than good in an entity resolution task. Let’s take the restaurants
open dataset and three of its records as an example—see the Glossary for attribution and references for open data.
Get hands-on with 1400+ tech skills courses.