Solution: Indexing with Splink

Learn how to solve the exercise posed in the previous lesson.

We'll cover the following

Let’s take one extra step in our Splink deduplication setup.

Task

The solvers_kitchen/restaurants.csv dataset is available in the environment and contains duplicates. This exercise aims to reduce the number of comparisons by index blocking.

Splink accepts blocking rules in SQL syntax—for example, to translate “matches on the first three digits of phone numbers or matches exactly on cities,” we set the following code:

Get hands-on with 1200+ tech skills courses.