Match vs. No-Match
Combine individual scores into a match vs. no-match prediction policy using plausible rules.
We'll cover the following...
We are humans with intuition, have prior experience with similar tasks, or did a great job preparing by reviewing the data. Now, we (believe to) know how to distinguish between a match or no-match for any pair of customer records. Let’s implement this knowledge and translate it into a policy combining a few plausible rules.
Below, we define four matching rules and predict a match if any of those applies.
rule_1 = scores['customer_name_c_score'].ge(0.8) & scores['street_c_score'].ge(0.8)rule_2 = scores['customer_name_c_score'].ge(0.9) & scores['street_c_score'].ge(0.5) & scores['city_c_score'].ge(0.8)rule_3 = scores['customer_name_p_score'].ge(0.9) & scores['street_p_score'].ge(0.9) & scores['city_p_score'].ge(0.9)rule_4 = scores['phone_c_score'].eq(1.)# Match if any individual rule is true, else no match:predicted_matches = scores.loc[rule_1 | rule_2 | rule_3 | rule_4].indexprint(predicted_matches[:3]) # Print 1st three matches as an example
In other words, we predict a match if any of the following rules applies:
Rule 1: The similarity of customer names and streets are high.
Rule 2: The similarity of customer names is very high and the address is moderate.
Rule 3: The phonetic similarity of customer names and addresses are both very high.
Rule 4: Phone numbers match exactly.
The literature calls such AND/OR combinations of threshold-based rules a