Exercise: Indexing with Splink

Learn how to add index blocking to a Splink configuration.

We'll cover the following

Let’s take one extra step in our Splink deduplication setup.

Task

The solvers_kitchen/restaurants.csv dataset is available in the environment and contains duplicates. This exercise aims to reduce the number of comparisons by index blocking.

Splink accepts blocking rules in SQL syntax—for example, to translate “matches on the first three digits of phone numbers or matches exactly on cities,” we set the following code:

Get hands-on with 1400+ tech skills courses.