Exercise: Indexing with Splink
Learn how to add index blocking to a Splink configuration.
We'll cover the following
Let’s take one extra step in our Splink deduplication setup.
Task
The solvers_kitchen/restaurants.csv
dataset is available in the environment and contains duplicates. This exercise aims to reduce the number of comparisons by index blocking.
Splink accepts blocking rules in SQL syntax—for example, to translate “matches on the first three digits of phone numbers or matches exactly on cities,” we set the following code:
Get hands-on with 1400+ tech skills courses.