Practice of Data Bias Mitigation
Explore some products that allow for proper sourcing and observability to mitigate biases.
Theory is nice to have, but professionals often use advanced tooling and software products to handle data quality issues. This lesson covers some of these products and where they excel. We also discuss their disadvantages and where their gaps lie. These services are useful for controlling and mitigating data risk, but nothing is a guaranteed solution. The best and only way to truly reduce data risk is to produce ethically curated data and have domain expertise on how the various factors interrelate with protected attributes. Everything else is a shortcut.
Data sourcing
It’s not typically straightforward to get our own data. In the past, we'd have needed to conduct in-person surveys or collect feedback over long periods of time to amass enough to use for decision-making. In today’s world, this is no longer the case—but sourcing good data is still difficult.
To source safe, ethical, and legal data, there are two paths available: real data and synthetic data. Each has its pros and cons.
Synthetic sources
Synthetic sources are just what they sound like. These are usually very niche companies that provide synthetic datasets or a creation API to generate safe samples. Here are a few examples: