...

/

Ensuring Data Privacy in Practice

Ensuring Data Privacy in Practice

Learn about industry solutions to ensure data privacy, and how synthetic data and federated learning can be used to handle data privacy.

Theoretical approaches carry value, but this lesson will cover some of the more common techniques and tools used in the real world to ensure data privacy and minimize reidentification and leakage risks.

Synthetic twins

Synthetic data can create high-fidelity, fake “copies” of a dataset that doesn’t contain any of the PII (the protected classes) of the original set. Recall that in earlier lessons, we’ve discussed sourcing data synthetically. Here, we generate new synthetic sources from an existing dataset that retains all of the original properties but removes all of the PII.

Press + to interact
Synthetic copies have the same data distributions as the original data
Synthetic copies have the same data distributions as the original data

There are a ton of solutions in the healthcare industry that attempt to remove HIPAA (a healthcare data compliance law) concerns by creating synthetic twin datasets. Synthetic data is usually generated via an adversarial algorithm. Recall that we spoke about this approach when we considered data bias. Essentially, one algorithm tries to identify if there’s a major difference between the two datasets while the other continues to iteratively create new versions of synthetic data to try and beat the identifier. Some companies that offer this functionality are MDClone and Octopize. Of course, in ...

Access this course and 1400+ top-rated courses and projects.