...

/

Similarity of Numeric Attributes

Similarity of Numeric Attributes

Become familiar with configuring numeric similarity functions with the RecordLinkage API.

Duplicate payments likely have similar amounts and transaction dates. Duplicate locations have similar geocodes. All three are numeric (vector) attributes. Let’s use a tiny dataset to show how to configure similarity features for numeric attributes using the RecordLinkage API.

Press + to interact
import pandas as pd
df = pd.DataFrame({'amount': [123.45, 110.66, 120.55],
'date': pd.to_datetime(['2022-06-25', '2022-04-01', '2022-07-03']),
'lon': [6.54321, 6.98765, 6.54398],
'lat': [51.23456, 51.22222, 51.23111]})
print(df)

Note that dates can be interpreted as strings or numeric values in units of, for example, days. Our focus here is on the numeric interpretation.

Custom comparer

A similarity feature is limited to ...