Transitive Clustering

Become familiar with transitive clustering and its shortcomings.

Transitive clustering is the mother of all clustering algorithms for entity resolution. Its logic is appealing, super fast to compute, and beneficial for recall. However, it is typically paid with a significant reduction in precision. Let’s begin by examining how the algorithm works on a small dataset.

Connected components

Clustering follows pairwise prediction. Let r1,,rnr_1,\ldots,r_n​ denote the original records, where cij=1c_{ij}=1 ...