Binary Classification in Entity Resolution
Get an overview of binary classification in entity resolution.
We must decide for every pair of records if they belong to the same real-world entity. That’s a binary classification problem with classes “match” and “no-match.” However, the typical real-world entity resolution task is not as standard as typical classification textbook examples for different reasons.
A huge number of pairs growing quadratically with the record sample size. Most of them are trivial to classify.
A heavy class imbalance, typically with less than 0.1% actual matches.
Very few available labels (if any).
Let’s discuss some challenges and opportunities when dealing with binary classification for entity resolution.
Class imbalance and performance evaluation
Let