An Introduction to Entity Resolution in Python/

...

Deep Matching

Understand how deep learning models change entity resolution workflows.

We'll cover the following...

Shallow vs. deep learning workflows
Fine-tuning BERT
Benefiting from domain knowledge
Key takeaway

Entity resolution researchers have experimented with deep learning models for over a decade, outperforming classical approaches in precision and recall. In the early years, we saw low adoption due to the high computational costs of training these models, which consisted of millions of parameters. The situation today has changed dramatically thanks to several trends.

Google’s BERT paper brought a paradigm shift toward transfer learning—start from an open-source pretrained LLM and only fine-tune it for a specific task at a fraction of the typical costs and superior performance.
PyTorch and the Hugging Face community share frameworks, datasets, pretrained LLMs, and tutorials, all of which are open-source and free under a generous license.
Modern hardware (GPUs, TPUs, Mx chips on MacBooks, etc.) is affordable for practically everybody through cloud services or even inside personal computers at much lower costs than it used to be.

Let’s explore what this all means for entity resolution.

Shallow vs. deep learning workflows

When discussing deep learning, we always mean multilayer neural net architectures. Everything else is called shallow, which does not always mean simple—for example, a decision tree ensemble is still called shallow, even if it can grow arbitrarily complex. In other words, shallow vs. deep is not just about architectural complexity (although it tends to be the case) but a shift in how we train models.

The image below illustrates a typical shallow learning workflow. We treat feature engineering and model training as two separate steps during experimentation. Usually, we reengineer features using our domain expertise and the feedback from the last iteration—a lot of manual work.

Press + to interact

Introduction to Entity Resolution and Applications

A Quickstart Guide Using the RecordLinkage Package

Preprocessing

Indexing

Feature Engineering

Pairwise Matching

Clustering

Integration

Entity Resolution Fundamentals

Matching Products Across Two Online Shops

Conclusion

Appendix

Auto-Tagging System for Content Categorization

Deep Matching

Shallow vs. deep learning workflows