Protected Attributes

Understand the role of protected attributes.

Sensitive attributes

Before measuring potential biases, we must consider what attributes can be a subject of discrimination. Some of them can be defined by law (when there is a related regulation, e.g., credit scoring). We call them protected or sensitive attributes and define them as attributes we don’t want to discriminate against. But, of course, the decision if a specific attribute is sensitive may depend on the particular context. That’s why a good understanding of the problem is crucial.

Protected attributes can be included in the data directly; features like age, gender, and marital status should grab our attention. We should proceed with them carefully as a potential source of model bias. However, even if such an attribute is not visible for the model, it still can have an unwanted impact. Let’s consider the zip code. Assuming it is not a protected attribute by itself, it can be heavily correlated with race/ethnicity, as people of a particular origin may live in the same neighborhood. This situation is especially unwanted as it may be a factor in an unoptimized creation process. We call such an attribute a proxy attribute.

Defining sensitive attributes

The first step to measure bias is the identification of protected attributes. They will be different depending on the task, but we can start with the following list:

  • Age

  • Race/ethnic origin

  • Gender

  • Religion/beliefs

  • Political opinions

  • Marital status

  • Health condition/disability

  • Sexual orientation

In some scenarios, they can be defined explicitly by law. For example, it can be the case for hiring in many countries, and a recruiter can’t ask a candidate about their pregnancy plans during an interview. Another good source is bank services, where specific lists of protected attributes are defined.

This list is not comprehensive. The correct definition of sensitive attributes requires input from many specializations (data scientist, HR, subject matter expert, and more). And as usual, it depends on the specific issue. One dataset can contain different sensitive attributes when used for other purposes.

When looking for such attributes, a few questions might be helpful:

  • Is the attribute defined as sensitive by legislation?

  • Do we know from history that a specific group was discriminated against?

  • Is the attribute considered a piece of personal information?

  • Does the explanation from the model sound discriminative? For example, “The credit was rejected because of your political opinion.“

The fact that a model uses a sensitive attribute does not imply unfair results. In many scenarios, some of them might be required for modeling. Sometimes, the opposite can hold. Excluding a sensitive feature can result in unfair prediction. It might be challenging to predict the correct drug dosage without knowing the subject’s age, gender, and weight. If we screen for a future basketball player, height can be a significant feature and should be considered—but for hiring in a marketing agency, we consider it unfair.

Disparate impact and treatment

Two more concepts introduced by US labor law are very often used in the AI fairness area. The first one is disparate treatment. It happens when an employee’s behavior is unfair because of the value of a protected attribute, meaning that the value of the attribute was actually used in the decision process. Usually, it means intentional discrimination.

Disparate impact is a similar situation when a specific group is favored over another one. However, here values of protected attributes are not directly used. What is essential does not have to be done on purpose. Disparate impact is not illegal as long as business needs can justify the difference. For example, a requirement to carry heavy packages might have an adverse effect on older people. Still, as long as such a requirement is necessary for the job, it is not considered illegal. This was not the case when it was introduced solely for hiring younger people.

Remember that the situation can be different in other legislations.

Get hands-on with 1200+ tech skills courses.