Fighting Label Errors

Explore confident learning techniques to identify and handle label errors in entity resolution datasets. Understand how cleanlab enhances machine learning models to be robust against imperfect labels, improving accuracy and reliability in real-world noisy data scenarios.

We'll cover the following...

Detect label errors
Robust machine learning models
Integrate cleanlab into an iterative workflow
Key takeaway

The real world is full of imperfect data. If we ignore issues, we might draw wrong conclusions and make suboptimal decisions. We understand this because this course focuses on resolving duplicate records, one of several data quality issues. However, the resolution outcome itself depends on the data and its quality.

This lesson introduces learners to confident learning. Consider it a robust alternative to standard (or naive) machine learning. In confident learning, potential data errors are part of the modeling so that algorithms can automatically adapt to imperfect data—for example, can we trust that the example labels we use for the initial training of our machine learning model are 100% accurate?

Detect label errors

Machine learning algorithms require some labeled examples for initial training. In entity resolution, we select a subset of pairs and assign them to the match or no-match class. Large-scale applications, such as master data management in the enterprise, involve several users reviewing pairs of records. Every such manual ...

1.Introduction to Entity Resolution and Applications

2.A Quickstart Guide Using the RecordLinkage Package

3.Preprocessing

4.Indexing

5.Feature Engineering

6.Pairwise Matching

7.Clustering

8.Integration

Assessment

Mini Project

9.Conclusion

10.Appendix

Project

Fighting Label Errors

Detect label errors