Adversarial Debiasing

Learn about one of the most advanced fairness techniques: adversarial debiasing.

Adversarial learning

When addressing model unfairness, we have two goals that can sometimes be contradictory. First, we want the model to be as accurate as possible (according to the dataset). On the other, we need to ensure it’s fair. In realistic setups, there is always a tradeoff between them. We can say that there are two adversaries—each pulling the model in a slightly different direction.

It turns out that for models using neural networks (deep or shallow), there is an excellent framework for such problems. In general, we call it adversarial learning. Adversarial learning is very successful in multiple domains, especially generative models where we want to create new observations similar to existing ones, for example, face generation. So even though we don’t need any generative solution now, we can still apply the general rules of it.

Architecture

The architecture of such a model is relatively complex. We have a main (or primary) model that is supposed to solve the main task, like the admission problem. This model is unaware of any fairness issues; it simply ignores them. We also introduce a second model called adversarial. The adversarial does not care about the original problem at all. It is created to ensure that the primary model’s prediction satisfies some properties. In our case, we are interested in increasing fairness. There are multiple ways to implement this behavior. We will discuss a relatively simple one.

Let’s assume that the original model uses some features we consider sensitive. We can create an adversarial task to restore the value of sensitive features based solely on prediction. For a fair model, it should be a somewhat tricky task. If the model is unfair (for example, strongly favoring males over females), it will be easy to guess the value of gender attributes.

Get hands-on with 1400+ tech skills courses.