Adversarial Attacks on Explanations
Learn about the problems with explainability.
We'll cover the following
Like everything in ML, explainability has its own pitfalls. In academic circles, the largest controversy revolves around whether explainability actually contributes to understanding decision auditing at all. Because the goal of explainable AI is to foster trust and security around the algorithm, we must be able to rely on the output that these models provide to us. Otherwise, we doubt the model and its explanation.
Adversarial attacks on explainable models
We’ve discussed adversarial attacks on models already, but even the explanations of models can be manipulated. An adversary seeking to mislead or destroy human interpretation of an algorithm can attack explanations to make them useless or even incorrect. For example, exploits of LIME and SHAP take advantage of the methods’ slight perturbations of the black box.
LIME example
Recall that with LIME, a local explanation for a single decision is constructed by building a model over nearby data points. It generates these nearby data points synthetically (i.e., they’re not directly in the training set, they’re created separately as part of the LIME process).
Let’s call our original training set
Now, let’s consider another model,
Get hands-on with 1400+ tech skills courses.