...

Ensuring Data Privacy in Practice

Learn about industry solutions to ensure data privacy, and how synthetic data and federated learning can be used to handle data privacy.

We'll cover the following...

Synthetic twins
Federated learning
Encryption in cloud platforms
Obfuscation and redaction
Hashing

Theoretical approaches carry value, but this lesson will cover some of the more common techniques and tools used in the real world to ensure data privacy and minimize reidentification and leakage risks.

Synthetic twins

Synthetic data can create high-fidelity, fake “copies” of a dataset that doesn’t contain any of the PII (the protected classes) of the original set. Recall that in earlier lessons, we’ve discussed sourcing data synthetically. Here, we generate new synthetic sources from an existing dataset that retains all of the original properties but removes all of the PII.

Press + to interact

There are a ton of solutions in the healthcare industry that attempt to remove HIPAA (a healthcare data compliance law) concerns by creating synthetic twin datasets. Synthetic data is usually generated via an adversarial algorithm. Recall that we spoke about this approach when we considered data bias. Essentially, one algorithm tries to identify if there’s a major difference between the two datasets while the other continues to iteratively create new versions of synthetic data to try and beat the identifier. Some companies that offer this functionality are MDClone and Octopize. Of course, in other industries, synthetic twins offer the ability to completely disregard the original set—which might help us save money when meeting compliance and ...

Introduction

Disasters in Data

Disasters in Models

Measuring Causal Relations with Python

Alternatives to Traditional ML

Adversarial Robustness of Neural Networks

Conclusion

Assessment: Disasters in ML Pipelines

Ensuring Data Privacy in Practice

Synthetic twins