...

Adversarial Attacks on Explanations

Learn about the problems with explainability.

We'll cover the following...

Adversarial attacks on explainable models
- LIME example

Like everything in ML, explainability has its own pitfalls. In academic circles, the largest controversy revolves around whether explainability actually contributes to understanding decision auditing at all. Because the goal of explainable AI is to foster trust and security around the algorithm, we must be able to rely on the output that these models provide to us. Otherwise, we doubt the model and its explanation.

Adversarial attacks on explainable models

We’ve discussed adversarial attacks on models already, but even the explanations of models can be manipulated. An adversary seeking to mislead or destroy human interpretation of an algorithm can attack explanations to make them useless or even incorrect. For example, exploits of LIME and SHAP take advantage of the methods’ slight perturbations of the black box.

LIME example

Recall that with LIME, a local explanation for a single decision is constructed by building a model over nearby data points. It generates these nearby data points synthetically (i.e., they’re not directly in the training set, they’re created separately as part of the LIME process).

Let’s call our original training set $X$ , our trained model $M_t$ , and our decision point $x_{0}$ ...

Introduction

Disasters in Data

Disasters in Models

Measuring Causal Relations with Python

Alternatives to Traditional ML

Adversarial Robustness of Neural Networks

Conclusion

Assessment: Disasters in ML Pipelines

Adversarial Attacks on Explanations

Adversarial attacks on explainable models

LIME example