Similarity of Numeric Attributes
Become familiar with configuring numeric similarity functions with the RecordLinkage API.
We'll cover the following...
Duplicate payments likely have similar amounts and transaction dates. Duplicate locations have similar geocodes. All three are numeric (vector) attributes. Let’s use a tiny dataset to show how to configure similarity features for numeric attributes using the RecordLinkage API.
Press + to interact
import pandas as pddf = pd.DataFrame({'amount': [123.45, 110.66, 120.55],'date': pd.to_datetime(['2022-06-25', '2022-04-01', '2022-07-03']),'lon': [6.54321, 6.98765, 6.54398],'lat': [51.23456, 51.22222, 51.23111]})print(df)
Note that dates can be interpreted as strings or numeric values in units of, for example, days. Our focus here is on the numeric interpretation.
Custom comparer
A similarity feature is limited to