Match vs. No-Match

Combine individual scores into a match vs. no-match prediction policy using plausible rules.

We are humans with intuition, have prior experience with similar tasks, or did a great job preparing by reviewing the data. Now, we (believe to) know how to distinguish between a match or no-match for any pair of customer records. Let’s implement this knowledge and translate it into a policy combining a few plausible rules.

Below, we define four matching rules and predict a match if any of those applies.

Press + to interact
rule_1 = scores['customer_name_c_score'].ge(0.8) & scores['street_c_score'].ge(0.8)
rule_2 = scores['customer_name_c_score'].ge(0.9) & scores['street_c_score'].ge(0.5) & scores['city_c_score'].ge(0.8)
rule_3 = scores['customer_name_p_score'].ge(0.9) & scores['street_p_score'].ge(0.9) & scores['city_p_score'].ge(0.9)
rule_4 = scores['phone_c_score'].eq(1.)
# Match if any individual rule is true, else no match:
predicted_matches = scores.loc[rule_1 | rule_2 | rule_3 | rule_4].index
print(predicted_matches[:3]) # Print 1st three matches as an example

In other words, we predict a match if any of the following rules applies:

  • Rule 1: The similarity of customer names and streets are high.

  • Rule 2: The similarity of customer names is very high and the address is moderate.

  • Rule 3: The phonetic similarity of customer names and addresses are both very high.

  • Rule 4: Phone numbers match exactly.

The literature calls such AND/OR combinations of threshold-based rules a similarity ...