Significance of Indexing

Understand how indexing helps reduce costs at the expense of precision and recall.

The matching quality is not the only factor of practical relevance. Think of it this way—we could hire an army of educated people to manually review each pair of records. That would likely push precisionIt measures the accuracy among all match predictions., recallIt measures the percentage of actual matches predicted as actual matches., and our costs through the roof.

Conversely, we could group records by exactly matching all relevant attributes after simple preprocessing (lowercasing, special character removal)—something we can implement very cheaply with a few lines of pandas or SQL. In practice, we must make a trade-off between matching quality and costs.

Get hands-on with 1200+ tech skills courses.