...

/

Practice of Data Bias Mitigation

Practice of Data Bias Mitigation

Explore some products that allow for proper sourcing and observability to mitigate biases.

Theory is nice to have, but professionals often use advanced tooling and software products to handle data quality issues. This lesson covers some of these products and where they excel. We also discuss their disadvantages and where their gaps lie. These services are useful for controlling and mitigating data risk, but nothing is a guaranteed solution. The best and only way to truly reduce data risk is to produce ethically curated data and have domain expertise on how the various factors interrelate with protected attributes. Everything else is a shortcut.

Data sourcing

It’s not typically straightforward to get our own data. In the past, we'd have needed to conduct in-person surveys or collect feedback over long periods of time to amass enough to use for decision-making. In today’s world, this is no longer the case—but sourcing good data is still difficult.

To source safe, ethical, and legal data, there are two paths available: real data and synthetic data. Each has its pros and cons.

Synthetic sources

Synthetic sources are just what they sound like. These are usually very niche companies that provide synthetic datasets or a creation API to generate safe samples. Here are a few examples:

  • Datagen: This is a data provider for computer vision datasets. They use a mixture of 3D art modeling and GANs to create artificial, diverse (from a racial or gender perspective) data—particularly for faces, bodies, and some other areas.

  • Datomize: This is a tabular synthetic ...