Introduce Similarity
Explore how to engineer similarity features for entity resolution by comparing record pairs from multiple angles. Understand how to transform distance measures into similarity scores and apply efficient methods for indexing and scoring. This lesson helps you develop a strong foundation in similarity feature engineering to improve binary classification of record matches.
Entity resolution is about identifying records that belong to the same real-world entity. We compare candidate pairs of records and decide if it is a match or no-match for each. In other words, we have to solve a binary classification problem.
Features for binary classification
Let’s introduce feature engineering in the context of entity resolution. Let
We feed the model with vectors of numeric values