Theory of Data Bias Mitigation
Learn some mathematical approaches to bias mitigation in ML.
Let’s explore the theoretical solutions for solving various types of data bias.
Non-ML approaches
These solutions are useful for data scientists to know because they’re generally cost- and time-effective tools and procedures that can greatly enhance the quality of the data, if done properly. These are, in many cases, simplified approaches to the higher-grade fixes that ML debiasers can provide. However, they are still very much worth knowing.
Oversampling and undersampling
One very simple approach is to change the sampling structure of the underlying data. In essence, we either duplicate rows of the minority group to match the numbers in the majority group (oversampling), or randomly remove rows of the majority class to match the numbers in the minority class (undersampling).
Oversampling
Let's consider a dataset with three variables: age
, credit score
, and race
. We’ll use a binary race
variable for simplicity. We’ll also set the prior distribution to draw race
of 0 80% of the time. That way, we can quickly calculate the change in representation rate.
Get hands-on with 1400+ tech skills courses.