Responsible AI: Principles and Practices/

...

Case Study: Identifying Bias in Personal and Sensitive Data

Learn how to identify bias in personal and sensitive data using Fairlearn.

We'll cover the following...

Bias in data can lead to unfair and discriminatory outcomes in AI systems. By actively seeking out and addressing bias, we can work toward ensuring fair treatment and nondiscrimination for all individuals and groups.

Understanding personal and sensitive attributes in data

In the context of bias in AI solutions, sensitive data refers to a characteristic or attribute that is closely associated with protected or vulnerable groups.

Sensitive features can include attributes such as race, ethnicity, gender, age, religion, sexual orientation, disability status, and socioeconomic background. These features are considered sensitive because they have historically been associated with discrimination or marginalization in various domains.

Personally identifiable information (PII) refers to any information that can be used to identify an individual uniquely. It includes personally identifiable attributes such as full name, social security number, date of birth, address, phone number, email address, financial account numbers, and more. PII is considered sensitive because its exposure or misuse can lead to privacy breaches, identity theft, or other forms of harm.

Python 3.8

Files

#Import libraries
import pandas as pd
import os
import matplotlib.pyplot as plt
import seaborn as sns
#Read the data
data = pd.read_csv('loan_approval.csv')
data.drop(['Loan_Id'],axis=1,inplace=True)
#Identify sensitive features
sensitive_features  = ['Gender','Married','Self_Employed']
print('\n Distribution of Gender')
print(data['Gender'].value_counts(normalize=True))
print('\n')
print('\n Distribution of Married Status')
print(data['Married'].value_counts(normalize=True))
print('\n')
print('\n Distribution of Self_Employed')
print(data['Self_Employed'].value_counts(normalize=True))
print('\n')

Case Study: Identifying Bias in Personal and Sensitive Data

Understanding personal and sensitive attributes in data

Overview of the credit loan dataset

Identifying bias in sensitive attributes of loan data