Responsible AI: Principles and Practices/

...

/

Case Study: Explore Feature Impact with Partial Dependence Plots

So far, we’ve explored the relative importance of different features. In this lesson, we will embark on a new journey where we’ll discover how a specific feature interacts with the target variable.

More specifically, we’ll study the partial dependence plot (PDP)—a powerful visual tool in machine learning that unveils the influence of a particular feature on the model’s predictions, while keeping all other features constant. By examining the isolated impact of a single variable across a range of values, PDPs help us understand the complex inner workings of the model.

PDPs provide a global perspective, focusing on the average effect of a feature rather than specific instances. This technique offers a range of benefits:

It’s easy to compute and explain in simple terms, making it accessible to everyone.
It helps us uncover the relationship between a feature (or a combination of features) and the target variable.
Unlike other techniques, PDPs provide a causal interpretation, giving us valuable insights into how the feature impacts the model’s output.

In this lesson, we’ll analyze a loan dataset and apply the partial dependence plot to gain a deeper understanding of the model’s explainability.

C++

Files

#Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
#Read the data
data = pd.read_csv('loan_approval.csv')
# Transform categorical variables into numeric values 
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
obj = (data.dtypes == 'object')
for col in list(obj[obj].index):
      data[col] = label_encoder.fit_transform(data[col])
# Missing values before imputation
print('Missing Values before imputation\n')  
print(data.isna().sum())        
for col in data.columns:
# Imputing missing values with mean 
    data[col] = data[col].fillna(data[col].mean()) 
print('Missing Values after imputation\n')    
print(data.isna().sum())
# Drop target variable from feature set
X = data.drop(['Loan_Status'],axis=1)
Y = data['Loan_Status']
X.shape,Y.shape
#Split data into train and test samples  
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.4,random_state=1)
print('Data shape after Splitting into Training and Test sets:\n')    
print('X_Train:',X_train.shape,'\nX_test:', X_test.shape,'\nY_train:', Y_train.shape,'\nY_test', Y_test.shape)

Case Study: Explore Feature Impact with Partial Dependence Plots

Data ingestion and exploratory analysis