Similarity Features

Become familiar with the RecordLinkage API for engineering similarity features.

RecordLinkage follows the following two main steps:

  1. Indexing: Select which pairs of records are duplicate candidates and therefore should be compared.
  2. Scoring: Configure and compute a vector of similarity functions for every pair in the index.

All-in indexing

We keep it simple here and add every possible pair to the index—a “full” index in the RecordLinkage terminology.

Get hands-on with 1400+ tech skills courses.