Fundamental Tradeoffs between Invariance and Sensitivity to - - PowerPoint PPT Presentation

fundamental tradeoffs between invariance and sensitivity
SMART_READER_LITE
LIVE PREVIEW

Fundamental Tradeoffs between Invariance and Sensitivity to - - PowerPoint PPT Presentation

Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations Florian Tramr Jens Behrmann Nicholas Carlini Nicolas Papernot Jrn-Henrik Jacobsen What are Adversarial Examples? any input to a ML model that is


slide-1
SLIDE 1

Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations

Florian Tramèr Jens Behrmann Nicholas Carlini Nicolas Papernot Jörn-Henrik Jacobsen

slide-2
SLIDE 2

What are Adversarial Examples?

“any input to a ML model that is intentionally designed by an attacker to fool the model into producing an incorrect output”

99% Guacamole “Small” perturbations Nonsensical inputs 99% Guacamole 99% Guacamole “Large” perturbations

etc.

slide-3
SLIDE 3

Lp-bounded Adversarial Examples

Given input x, find x′ that is misclassified such that x! − x ≤ ε (+) Easy to formalize (−) Incomplete

Adversarial Examples

Lp bounded (excessive sensitivity)

Concrete measure of progress:

“my classifier has 97% accuracy for perturbations of L2 norm bounded by 𝜁 = 2 ”

slide-4
SLIDE 4

Goodhart’s Law

“When a measure becomes a target, it ceases to be a good measure”

slide-5
SLIDE 5

New Vulnerability: Invariance Adversarial Examples

3 Adversarial Examples

excessive sensitivity excessive invariance

3

Small semantics-altering perturbations that don’t change classification

slide-6
SLIDE 6

Our Results

State-of-the-art robust models are too robust Invariance to semantically meaningful features can be exploited Inherent tradeoffs Solving excessive sensitivity & invariance implies perfect classifier

Model with 88% certified robust accuracy

12% agreement with human labels

1 1

slide-7
SLIDE 7

A Fundamental Tradeoff

Hermit-crab x! − x " ≤ 22 Guacamole

OK! I’ll make my classifier robust to L2 perturbations of size 22

(we don’t yet know how to do this on ImageNet)

slide-8
SLIDE 8

A Fundamental Tradeoff

Hermit-crab x! − x " ≤ 22 Hermit-crab

OK! I’ll choose a better norm than L2

slide-9
SLIDE 9

A Fundamental Tradeoff

Theorem (informal) Choosing a “good” norm is as hard as building a perfect classifier

slide-10
SLIDE 10

Are Current Classifiers Already too Robust?

slide-11
SLIDE 11

A Case-Study on MNIST

State-of-the-art certified robustness: 𝑀# ≤ 0.3: 93% accuracy 𝑀# ≤ 0.4: 88% accuracy

= 0.3

= 0.4

Model certifies that it labels both inputs the same

slide-12
SLIDE 12

Automatically Generating Invariance Attacks

Challenge: ensure label is changed from human perspective Meta-procedure: alignment via data augmentation

input input from

  • ther class

semantics- preserving transformation diff result a few tricks

slide-13
SLIDE 13

Do our invariance examples change human labels?

21% 0% 37% 88% Open problem: better automated attacks no attack ℓ# ≤ 0.3 ℓ# ≤ 0.4 ℓ# ≤ 0.4 (manual)

slide-14
SLIDE 14

Which models agree most with humans?

More robust models

Most robust model provably gets all invariance examples wrong!

slide-15
SLIDE 15

Why can models be accurate yet overly invariant?

Or, why can an MNIST model achieve 88% test-accuracy for ℓ# ≤ 0.4 ? Problem: dataset is not diverse enough Partial solution: data augmentation

More robust models

slide-16
SLIDE 16

Conclusion

Robustness isn’t yet another metric to monotonically optimize! Max “real” robust accuracy on MNIST: ≈80% at ℓ" = 0.3 ≈10% at ℓ" = 0.4 Þ We’ve already over-optimized! Are we really making classifiers more robust,

  • r just overly smooth?