Manual Review and Labeling
Explore manual review and labeling within the entity resolution clustering process. Understand how clustering optimizes review efficiency, enabling you to improve classification accuracy and cost-effectiveness. Discover practical approaches using Python and Streamlit for interactive cluster review and when to consider mature solutions.
We'll cover the following...
Clustering is a critical step in every entity resolution pipeline. Most importantly, it resolves conflicts from pairwise matching and enables us to build a cross-reference table.
We can stop after clustering if we are satisfied with the resolution quality, or we can start another training cycle with the help of some manual review—the topic of this lesson.
Humans in the loop
The following figure shows one of many possible entity resolution workflows, with two (optional) spots for humans in the loop.
The training data consists of record pairs labeled as a match or no-match, which we can use to fit a binary classification model and predict classes of unlabeled ...