Reidentification Example
Understand how combining datasets like Netflix, IMDb, and Experian data can lead to reidentification of individuals. Learn the implications of such privacy breaches, including predatory advertising and data misuse, and recognize the importance of safeguarding personal information in ML pipelines.
To better illustrate how dangerous reidentification is, we examine a relevant example in the financial context. We’ll take the recent Experian data breaches as inspiration.
Setup
Imagine we have three datasets—the Netflix ratings dataset (made public for research/competitions), the IMDb ratings dataset (always public), and credit data from Experian (obtained and released through a major data ...