An Introduction to Entity Resolution in Python/

...

K-Nearest Embeddings Blocking

Become familiar with vector databases and their use for efficient index blocking.

We'll cover the following...

Concept
Proof of concept
Benchmarking vector search with LanceDB
Semantic similarity with SBERT
Key takeaway

Indexing in entity resolution helps address the main computational challenge, which is the large number of potential candidate pairs. Traditional methods rely on the following:

Lexical matching (also known as exact matching), such as SB
Lexical sorting, like in SN

Both methods work within the original feature space. Let’s explore how to measure similarity in a vector space and how to complement lexical by semantic search.

Concept

Let $r_1,\ldots,r_n$ denote our dataset of size $n$ ...

Introduction to Entity Resolution and Applications

A Quickstart Guide Using the RecordLinkage Package

Preprocessing

Indexing

Feature Engineering

Pairwise Matching

Clustering

Integration

Entity Resolution Fundamentals

Matching Products Across Two Online Shops

Conclusion

Appendix

Auto-Tagging System for Content Categorization

K-Nearest Embeddings Blocking

Concept