Explore entity resolution in Python, including use cases, semantic preprocessing, graph clustering, and weak supervision. Boost business value with hands-on coding and strategic decisions.

Hands-on Entity Resolution with Python-v3.png

dockerfile.tar.gz

conda-main

A typical business stores data across multiple systems, including ERPs for operations, a CRM for marketing, files, notebooks, and BI apps for other purposes. Records of the same customer (entity) exist in multiple places, likely not in sync across nor unique within sources. This inconsistent situation generates an opportunity for us to drive business value by cross-referencing and deduplicating records with entity resolution.

This course covers business acumen and hands-on coding. It starts with several business cases and a quick introduction to entity resolution in Python. Then, it explores semantic-preserving preprocessing, similarity feature engineering, graph clustering, weak supervision, confident learning, and integration.

As a developer, you’ll increase your company’s business value by developing and deploying entity resolution pipelines. As a decision-maker, you’ll know which solution best suits your business cases and how to negotiate the best value for your money.

An Introduction to Entity Resolution in Python

Get familiar with dbt and data transformation pipelines.

Introduction to Entity Resolution and Applications

A Quickstart Guide Using the RecordLinkage Package

Preprocessing

Indexing

Feature Engineering

Pairwise Matching

Clustering

Integration

Entity Resolution Fundamentals

Matching Products Across Two Online Shops

Conclusion

Appendix

Auto-Tagging System for Content Categorization

Local Stack with DuckDB and dbt

Configuring our data stack