Persistent Classification: Understanding Adversarial Attacks by Studying Decision Boundary Dynamics

Brian Bell, Michael Geyer, David Glickenstein, Keaton Hamm, Carlos Scheidegger, Amanda Fernandez, Juston Moore

Research output: Contribution to journalArticlepeer-review

Abstract

There are a number of hypotheses underlying the existence of adversarial examples for classification problems. These include the high-dimensionality of the data, the high codimension in the ambient space of the data manifolds of interest, and that the structure of machine learning models may encourage classifiers to develop decision boundaries close to data points. This article proposes a new framework for studying adversarial examples that does not depend directly on the distance to the decision boundary. Similarly to the smoothed classifier literature, we define a (natural or adversarial) data point to be (γ, σ)-stable if the probability of the same classification is at least (Formula presented.) for points sampled in a Gaussian neighborhood of the point with a given standard deviation (Formula presented.). We focus on studying the differences between persistence metrics along interpolants of natural and adversarial points. We show that adversarial examples have significantly lower persistence than natural examples for large neural networks in the context of the MNIST and ImageNet datasets. We connect this lack of persistence with decision boundary geometry by measuring angles of interpolants with respect to decision boundaries. Finally, we connect this approach with robustness by developing a manifold alignment gradient metric and demonstrating the increase in robustness that can be achieved when training with the addition of this metric.

Original languageEnglish (US)
Article numbere11716
JournalStatistical Analysis and Data Mining
Volume18
Issue number1
DOIs
StatePublished - Feb 2025
Externally publishedYes

ASJC Scopus subject areas

  • Analysis
  • Information Systems
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Persistent Classification: Understanding Adversarial Attacks by Studying Decision Boundary Dynamics'. Together they form a unique fingerprint.

Cite this