Learning from Labeled Examples

Learn how to fit a binary classification algorithm using example labels.

Entity resolution is a binary classification problem at its heart. For every pair of records, we must decide if that pair is a “match” (positive class) or a “no-match” (negative class). This lesson is about training a machine learning model using examples of pairs where we know the outcome.

We assume that learners have some experience with machine learning so that this lesson can focus on the specificities of entity resolution. In particular, what the typical features look like, how to train and evaluate with class imbalances of 1 to 10000 or worse, and incorporate monotonicity constraints into a classification model.

Preparing the North Carolina voters’ features

The dataset below is a lightweight version of the North Carolina voters’ open dataset. It comes with cross-references to know which records truly match by entity. We will use these to build our class labels to train and evaluate our classification model.

Get hands-on with 1400+ tech skills courses.