...

/

Adversarial Attacks on Explanations

Adversarial Attacks on Explanations

Learn about the problems with explainability.

Like everything in ML, explainability has its own pitfalls. In academic circles, the largest controversy revolves around whether explainability actually contributes to understanding decision auditing at all. Because the goal of explainable AI is to foster trust and security around the algorithm, we must be able to rely on the output that these models provide to us. Otherwise, we doubt the model and its explanation.

Adversarial attacks on explainable models

We’ve discussed adversarial attacks on models already, but even the explanations of models can be manipulated. An adversary seeking to mislead or destroy human interpretation of an algorithm can attack explanations to make them useless or even incorrect. For example, exploits of LIME and SHAP take advantage of the methods’ slight perturbations of the black box.

LIME example

Recall that with LIME, a local explanation for a single decision is constructed by building a model over nearby data points. It generates these ...

Access this course and 1400+ top-rated courses and projects.