Equality-Based Indexing
Become familiar with different methods of equality-based indexing and associated challenges.
We'll cover the following
The typical record consists of several attributes: names, addresses, transaction dates, prices, sizes, colors, etc. We expect duplicates to be similar across most attributes. For some attributes, we even expect an exact match—for example, duplicate customer records will unlikely have different country attributes.
Note: The
restaurants
dataset we use below is open data. See the Glossary of the course for attribution and references.
Standard blocking (SB)
SB, or “blocking,” is so prevalent that indexing is often used as a synonym for this technique. If not stated otherwise explicitly, people mean SB when they talk about indexing on a particular attribute.
Below, we read the restaurants
dataset and use recordlinkage
to block by the city
attribute:
Get hands-on with 1400+ tech skills courses.