A Closer Look at Adversarial Examples for Separated Data Kamalika - - PowerPoint PPT Presentation

a closer look at adversarial examples for separated data
SMART_READER_LITE
LIVE PREVIEW

A Closer Look at Adversarial Examples for Separated Data Kamalika - - PowerPoint PPT Presentation

A Closer Look at Adversarial Examples for Separated Data Kamalika Chaudhuri University of California, San Diego Adversarial Examples Gibbon Panda Small perturbation to legitimate inputs causing misclassification Adversarial Examples Can


slide-1
SLIDE 1

Kamalika Chaudhuri

A Closer Look at Adversarial Examples for Separated Data

University of California, San Diego

slide-2
SLIDE 2

Adversarial Examples

Small perturbation to legitimate inputs causing misclassification

Panda Gibbon

slide-3
SLIDE 3

Adversarial Examples

Can potentially lead to serious safety issues

slide-4
SLIDE 4

Adversarial Examples: State of the Art

A large number of attacks A few defenses Not much understanding on why adversarial examples arise

slide-5
SLIDE 5

Adversarial Examples: State of the Art

A large number of attacks A few defenses Not much understanding on why adversarial examples arise This talk: A closer look

slide-6
SLIDE 6

Background: Classification

Given: ( xi, yi )

Vector of features Discrete Labels

Find: Prediction rule in a class to predict y from x

slide-7
SLIDE 7

Background: The Statistical Learning Framework

Training and test data drawn from an underlying distribution D

+

slide-8
SLIDE 8

Background: The Statistical Learning Framework

Training and test data drawn from an underlying distribution D Goal: Find classifier f to maximize accuracy

Pr

(x,y)∼D(f(x) = y)

+

slide-9
SLIDE 9

Measure of Robustness: Lp norm

A classifier f is robust with radius r at x if it predicts f(x) for all x’ in

kx x0kp  r

slide-10
SLIDE 10

Why do we have adversarial examples?

slide-11
SLIDE 11

Data distribution +

  • Distributional

Robustness

Why do we have adversarial examples?

slide-12
SLIDE 12

Data distribution Too few samples +

  • +
  • Distributional

Robustness Finite Sample Robustness

Why do we have adversarial examples?

slide-13
SLIDE 13

Data distribution Too few samples +

  • +
  • Bad

algorithm +

  • Distributional

Robustness Finite Sample Robustness Algorithmic Robustness

Why do we have adversarial examples?

slide-14
SLIDE 14

Data distribution +

  • Distributional

Robustness

Why do we have adversarial examples?

Are classes separated in real data?

slide-15
SLIDE 15

r-Separation

Data distribution D is r-separated if for any (x, y) and (x’, y’) drawn from D

y 6= y0 = ) kx x0k 2r

2r 2r

slide-16
SLIDE 16

r-Separation

Data distribution D is r-separated if for any (x, y) and (x’, y’) drawn from D

y 6= y0 = ) kx x0k 2r

2r 2r r-separation means accurate and robust at radius r classifier possible! 2r

slide-17
SLIDE 17

Real Data is r-Separated

Separation Typical r Dataset MNIST 0.74 0.1 CIFAR10 0.21 0.03 SVHN* 0.09 0.03 ResImgnet* 0.18 0.005

Separation = min distance between any two points in different classes

slide-18
SLIDE 18

Robustness for r-separated data: Two Settings

Non-parametric Methods

slide-19
SLIDE 19

Non-Parametric Methods

k-Nearest Neighbors Decision Trees Others: Random Forests, Kernel classifiers, etc

slide-20
SLIDE 20

What is known about Nonparametric Methods?

slide-21
SLIDE 21

The Bayes Optimal Classifier

Classifier with maximum accuracy on data distribution Only reachable in the large sample limit

slide-22
SLIDE 22

What is known about Non-Parametrics?

With growing training data, accuracy of non-parametric methods converge to accuracy of the Bayes Optimal

slide-23
SLIDE 23

What about Robustness?

Prior work: Attacks and defenses for specific classifiers Our work: General conditions when we can get robustness

slide-24
SLIDE 24

What is the goal of robust classification?

slide-25
SLIDE 25

What is the goal of robust classification?

Bayes optimal undefined

  • utside distribution

Bayes optimal

slide-26
SLIDE 26

The r-optimal [YRZC20]

Bayes optimal undefined

  • utside distribution

r-optimal = classifier that maximizes accuracy at points that have robustness radius at least r Bayes optimal r-optimal x

slide-27
SLIDE 27

Convergence Result [BC20]

2r Theorem: For r-separated data, condi- tions when non-parametrics converge to r-optimal in large n limit

slide-28
SLIDE 28

Convergence Result [BC20]

2r r-optimal: Nearest neighbor, Kernel classifiers Convergence limit: Theorem: For r-separated data, condi- tions when non-parametrics converge to r-optimal in large n limit

slide-29
SLIDE 29

Convergence Result [BC20]

2r r-optimal: Nearest neighbor, Kernel classifiers Bayes-optimal but not r-optimal: Histograms, Decision trees Convergence limit: Theorem: For r-separated data, condi- tions when non-parametrics converge to r-optimal in large n limit

slide-30
SLIDE 30

Convergence Result [BC20]

2r r-optimal: Nearest neighbor, Kernel classifiers Bayes-optimal but not r-optimal: Histograms, Decision trees Convergence limit: Theorem: For r-separated data, condi- tions when non-parametrics converge to r-optimal in large n limit Robustness depends on training algorithm!

slide-31
SLIDE 31

Robustness for r-separated data: Two Settings

Non-parametric Methods Neural Networks

slide-32
SLIDE 32

Robustness in Neural Networks

A large number of attacks A few defenses All defenses show a robustness-accuracy tradeoff Is this tradeoff necessary?

slide-33
SLIDE 33

The Setting: Neural Networks

Neural network computes function f(x) Classifier output sign(f(x)) x f(x) f

slide-34
SLIDE 34

The Setting: Neural Networks

Neural network computes function f(x) Classifier output sign(f(x)) x f(x) If f is locally Lipschitz around x, and f(x) is away from 0, then f is robust at x f Robustness comes from Local Smoothness:

slide-35
SLIDE 35

Robustness and Accuracy Possible through Local Lipschitzness

x f(x) Theorem [YRZSC20] If distribution is r-separated, then there exists an f s.t. f is locally smooth and sign(f) has accuracy 1 and robustness radius r Neural network computes function f(x) Classifier output sign(f(x)) f

slide-36
SLIDE 36

In principle, no robustness-accuracy tradeoff In practice there is one What accounts for this gap?

slide-37
SLIDE 37

Empirical Study

4 standard image datasets 7 models 6 different training methods - Natural, AT, Trades, LLR, GR Measure local Lipschitzness, accuracy and adversarial accuracy

slide-38
SLIDE 38

Result: CIFAR 10

slide-39
SLIDE 39

Observations

Trades and Adversarial training have best local Lipschitzness Overall, local Lipschitzness correlated with robustness and accuracy - until underfitting begins Generalization gap is quite large - possibly a sign of overfitting Overall: robustness/accuracy tradeoff due to imperfect training methods

slide-40
SLIDE 40

Data distribution Too few samples +

  • +
  • Bad

algorithm +

  • Distributional

Robustness Finite Sample Robustness Algorithmic Robustness

Conclusion: Why do we have adversarial examples?

slide-41
SLIDE 41

References

Robustness for Non-parametric Methods: A generic defense and an attack, Y. Yang, C. Rashtchian,

  • Y. Wang, and K. Chaudhuri, AISTATS 2020

When are Non-parametric Methods Robust?

  • R. Bhattacharjee and K. Chaudhuri, Arxiv 2003.06121

Adversarial Robustness through Local Lipschitzness Y. Yang, C. Rashtchian, H. Zhang, R. Salakhutdinov and K. Chaudhuri, Arxiv 2003.02460

slide-42
SLIDE 42

Acknowledgements

Cyrus Rashtchian Yaoyuan Yang Yizhen Wang Hongyang Zhang Robi Bhattacharjee Ruslan Salakhutdinov