An Introduction to Entity Resolution in Python/

...

Similarity Features

Become familiar with the RecordLinkage API for engineering similarity features.

We'll cover the following...

All-in indexing
Measuring similarity
Explore scores
Key takeaway

Press + to interact

Python 3.8

comparer = rl.Compare(n_jobs=-1)
print('Configuring one similarity function per attribute...')
for attribute in ['customer_name_c', 'customer_name_p', 'city_c', 'city_p']:
    comparer.string(left_on=attribute, right_on=attribute, method='jarowinkler', label=attribute + '_score')
for attribute in ['street_c', 'street_p']:
    comparer.string(left_on=attribute, right_on=attribute, method='damerau_levenshtein', label=attribute + '_score')
comparer.exact(left_on='phone_c', right_on='phone_c', label='phone_c_score')

Introduction to Entity Resolution and Applications

A Quickstart Guide Using the RecordLinkage Package

Preprocessing

Indexing

Feature Engineering

Pairwise Matching

Clustering

Integration

Entity Resolution Fundamentals

Matching Products Across Two Online Shops

Conclusion

Appendix

Auto-Tagging System for Content Categorization

Similarity Features

All-in indexing

Measuring similarity