Adversarial Examples: Attacking Deep Learning Models

Explore the concept of adversarial examples and practice adversarial attacking with PyTorch.

It is known that with deep learning methods that have huge numbers of parameters, sometimes more than tens of millions, it becomes more difficult for humans to comprehend what exactly they have learned, except the fact that they perform unexpectedly well in CV and NLP fields. If someone feels exceptionally comfortable using deep learning to solve each and every practical problem without a second thought, what we are about to learn in this chapter will help them realize the potential risks their models are exposed to.

What are adversarial examples, and how are they created?

Adversarial examples are a kind of sample (often modified based on real data) that are easily mistakenly classified by a machine learning system (and sometimes look normal to the human eye). Modifications to image data could be a small amount of added noisehttps://openai.com/research/attacking-machine-learning-with-adversarial-examples or a small image patchBrown, Tom B., Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. "Adversarial patch." arXiv preprint arXiv:1712.09665 (2017).. Sometimes, printing them on paper and taking pictures of adversarial examplesAthalye, Anish, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. "Synthesizing robust adversarial examples." In International conference on machine learning, pp. 284-293. PMLR, 2018. also fools neural networks. It is even possible to 3D-print an object that fools neural networks from almost all perspectives. Although we can create some random samples that look like nothing natural and still cause neural networks to make mistakes, it is far more interesting to study the adversarial examples that look normal to humans but are misclassified by neural networks.

Get hands-on with 1400+ tech skills courses.