An Introduction to Entity Resolution in Python/

...

Evaluate the Match Quality

Review classification errors and learn how to improve a matching model by example.

We'll cover the following...

Evaluation metrics
False positives
False negatives

Press + to interact

Python 3.8

from itertools import combinations
from typing import Union
def cross_ref_to_index(df: pd.DataFrame, id_column: str, match_key_columns: Union[str, list[str]]) -> pd.MultiIndex:
    match_lists = df.sort_values(id_column, ascending=False).groupby(match_key_columns)[id_column].apply(lambda s: list(s))
    match_lists = match_lists.loc[match_lists.apply(lambda s: len(s)) > 1]
    
    match_pairs = []
    for match_list in match_lists:
        match_pairs += list(combinations(match_list, 2))
    
    return pd.MultiIndex.from_tuples(match_pairs)
true_matches = cross_ref_to_index(df=classes, id_column='customer_id', match_key_columns='class')
print('First three examples:')
print(true_matches[:3])

Introduction to Entity Resolution and Applications

A Quickstart Guide Using the RecordLinkage Package

Preprocessing

Indexing

Feature Engineering

Pairwise Matching

Clustering

Integration

Entity Resolution Fundamentals

Matching Products Across Two Online Shops

Conclusion

Appendix

Auto-Tagging System for Content Categorization

Evaluate the Match Quality

Evaluation metrics