Lessons Learned from Evaluating the Robustness of Defenses to - - PowerPoint PPT Presentation

lessons learned from evaluating the robustness of
SMART_READER_LITE
LIVE PREVIEW

Lessons Learned from Evaluating the Robustness of Defenses to - - PowerPoint PPT Presentation

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples Nicholas Carlini Google Research Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples Lessons Learned from Evaluating the


slide-1
SLIDE 1

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples

Nicholas Carlini Google Research
slide-2
SLIDE 2

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples

slide-3
SLIDE 3

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8

Why should we care about adversarial examples?

Make ML robust Make ML better

slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11

How do we generate adversarial examples?

slide-12
SLIDE 12 Truck Dog Random Direction Random Direction
slide-13
SLIDE 13 Dog Truck Airplane Random Direction Adversarial Direction Adversarial Direction Random Direction
slide-14
SLIDE 14
slide-15
SLIDE 15 ( (
slide-16
SLIDE 16
slide-17
SLIDE 17

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples

slide-18
SLIDE 18

A defense is a neural network that

  • 1. Is accurate on the test data
  • 2. Resists adversarial examples
slide-19
SLIDE 19

For example: Adversarial Training

Claim: Neural networks don't generalize

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. Towards deep learning models resistant to adversarial attacks. ICLR 2018
slide-20
SLIDE 20

F

Normal Training

( , ) ( , ) 7 3

Training

slide-21
SLIDE 21

7 ( , ) ( , )

Adversarial Training (1)

( , ) 7 3

Attack

3 ( , )

slide-22
SLIDE 22

7

G

Adversarial Training (2)

3 ( , ) ( , ) 7 3

Training

( , ) ( , )

slide-23
SLIDE 23

Or: Thermometer Encoding

Claim: Neural networks are "overly linear"

Buckman, J., Roy, A., Raffel, C., & Goodfellow, I. Thermometer encoding: One hot way to resist adversarial examples. ICLR 2018
slide-24
SLIDE 24

T(0.13) = 1 1 0 0 0 0 0 0 0 0 T(0.66) = 1 1 1 1 1 1 0 0 0 0 T(0.97) = 1 1 1 1 1 1 1 1 1 1 Solution

slide-25
SLIDE 25

Or: Input Transformations

Claim: Perturbations are brittle

Guo, C., Rana, M., Cisse, M., & Van Der Maaten, L. Countering adversarial images using input transformations. ICLR 2018
slide-26
SLIDE 26

Solution

Random Transform
slide-27
SLIDE 27

Solution

JPEG Compress
slide-28
SLIDE 28
slide-29
SLIDE 29

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples

slide-30
SLIDE 30

What does it meant to evaluate the robustness of a defense?

slide-31
SLIDE 31 model = train_model(x_train, y_train) acc, loss = model.evaluate(
 x_test, y_test)
 if acc > 0.96: print("State-of-the-art")
 else: print("Keep Tuning Hyperparameters")

Standard ML Pipeline

slide-32
SLIDE 32 model = train_model(x_train, y_train) acc, loss = model.evaluate(
 x_test, y_test)
 if acc > 0.96: print("State-of-the-art")
 else: print("Keep Tuning Hyperparameters")

Standard ML Pipeline

slide-33
SLIDE 33 model = train_model(x_train, y_train) acc, loss = model.evaluate(
 x_test, y_test)
 if acc > 0.96: print("State-of-the-art")
 else: print("Keep Tuning Hyperparameters")

Standard ML Pipeline

slide-34
SLIDE 34 model = train_model(x_train, y_train) acc, loss = model.evaluate(
 x_test, y_test)
 if acc > 0.96: print("State-of-the-art")
 else: print("Keep Tuning Hyperparameters")

Standard ML Evaluations

slide-35
SLIDE 35 model = train_model(x_train, y_train) acc, loss = model.evaluate(
 x_test, y_test)
 if acc > 0.96: print("State-of-the-art")
 else: print("Keep Tuning Hyperparameters")

Standard ML Evaluations

slide-36
SLIDE 36

What are robustness evaluations?

slide-37
SLIDE 37 model = train_model(x_train, y_train) acc, loss = model.evaluate(
 x_test, y_test)
 if acc > 0.96: print("State-of-the-art")
 else: print("Keep Tuning Hyperparameters")

Standard ML Evaluations

slide-38
SLIDE 38 model = train_model(x_train, y_train) acc, loss = model.evaluate(
 A(x_test, model), y_test)
 if acc > 0.96: print("State-of-the-art")
 else: print("Keep Tuning Hyperparameters")

Adversarial ML Evaluations

slide-39
SLIDE 39

How complete are evaluations?

slide-40
SLIDE 40

Case Study: ICLR 2018

slide-41
SLIDE 41

Serious effort
 to evaluate
 By space, most papers are ½ evaluation

slide-42
SLIDE 42

We re-evalauted these defenses ...

slide-43
SLIDE 43

2 7 4

Out of scope Broken Defenses Correct Defenses

slide-44
SLIDE 44

2 7 4

Out of scope Broken Defenses Correct Defenses

slide-45
SLIDE 45

2 7 4

Out of scope Broken Defenses Correct Defenses

slide-46
SLIDE 46

So what did defenses do?

slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples

slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55

Lessons (1 of 3) what types of defenses are effective

slide-56
SLIDE 56

First class of effective defenses:

slide-57
SLIDE 57

First class of effective defenses: Adversarial Training

slide-58
SLIDE 58
slide-59
SLIDE 59
slide-60
SLIDE 60
slide-61
SLIDE 61

Second class of effective defenses:

slide-62
SLIDE 62

Second class of effective defenses: _______________

slide-63
SLIDE 63
slide-64
SLIDE 64

Lessons (2 of 3) what we've learned from evaluations

slide-65
SLIDE 65
slide-66
SLIDE 66
slide-67
SLIDE 67
slide-68
SLIDE 68
slide-69
SLIDE 69
slide-70
SLIDE 70

So how to attack it?

slide-71
SLIDE 71
slide-72
SLIDE 72

"Fixing" Gradient Descent

[0.1,
 0.3, 0.0,
 0.2, 0.4]
slide-73
SLIDE 73
slide-74
SLIDE 74

Lessons (3 of 3) performing better evaluations

slide-75
SLIDE 75
slide-76
SLIDE 76
slide-77
SLIDE 77

Everything the following papers do is standard practice Actionable advice requires specific, concrete examples

slide-78
SLIDE 78

Perform an adaptive attack

slide-79
SLIDE 79
slide-80
SLIDE 80
slide-81
SLIDE 81

A "hold out" set is not an adaptive attack

slide-82
SLIDE 82

Stop using FGSM (exclusively)

slide-83
SLIDE 83

Use more than 100 (or 1000?) iteration of gradient descent

slide-84
SLIDE 84

Iterative attacks should always do better than single step attacks.

slide-85
SLIDE 85

Unbounded optimization attacks should eventually reach in 0% accuracy

slide-86
SLIDE 86

Unbounded optimization attacks should eventually reach in 0% accuracy

slide-87
SLIDE 87

Unbounded optimization attacks should eventually reach in 0% accuracy

slide-88
SLIDE 88

Model accuracy should be monotonically decreasing

slide-89
SLIDE 89

Model accuracy should be monotonically decreasing

slide-90
SLIDE 90

slide-91
SLIDE 91

Evaluate against the worst attack

slide-92
SLIDE 92

Plot accuracy vs distortion

slide-93
SLIDE 93

Verify enough iterations

  • f gradient descent
slide-94
SLIDE 94

Try gradient-free attack algorithms

slide-95
SLIDE 95

Try random noise

slide-96
SLIDE 96
slide-97
SLIDE 97

The Future

slide-98
SLIDE 98
slide-99
SLIDE 99

The Year is 1997

slide-100
SLIDE 100
slide-101
SLIDE 101

Back to (the future)

slide-102
SLIDE 102
slide-103
SLIDE 103
slide-104
SLIDE 104

Are we crypto in the 90's?

slide-105
SLIDE 105

Maybe not. Two reasons.

slide-106
SLIDE 106

Reason 1.

slide-107
SLIDE 107
slide-108
SLIDE 108

Attack Success Rates in Security

(with credit to David Evans)
slide-109
SLIDE 109

Crypto: 2-128

Attack Success Rates in Security

(with credit to David Evans)
slide-110
SLIDE 110

Crypto: 2-128, broken if 2-127

Attack Success Rates in Security

(with credit to David Evans)
slide-111
SLIDE 111

Crypto: 2-128, broken if 2-127 Systems: 2-32

Attack Success Rates in Security

(with credit to David Evans)
slide-112
SLIDE 112

Crypto: 2-128, broken if 2-127 Systems: 2-32, broken if 2-20

Attack Success Rates in Security

(with credit to David Evans)
slide-113
SLIDE 113

Crypto: 2-128, broken if 2-127 Systems: 2-32, broken if 2-20 Machine Learning:

Attack Success Rates in Security

(with credit to David Evans)
slide-114
SLIDE 114

Crypto: 2-128, broken if 2-127 Systems: 2-32, broken if 2-20 Machine Learning: 2-1

Attack Success Rates in Security

(with credit to David Evans)
slide-115
SLIDE 115

Crypto: 2-128, broken if 2-127 Systems: 2-32, broken if 2-20 Machine Learning: 2-1, broken if 20

Attack Success Rates in Security

(with credit to David Evans)
slide-116
SLIDE 116

Reason 2.

slide-117
SLIDE 117
slide-118
SLIDE 118 L2 = 100
slide-119
SLIDE 119

Original

slide-120
SLIDE 120

L2 distortion: 75

slide-121
SLIDE 121

L2 distortion: 75

slide-122
SLIDE 122

Claim: We are crypto pre-Shannon

slide-123
SLIDE 123

Conclusion

slide-124
SLIDE 124

We've come a long way towards understanding adversarial robustness. We still have a long way to go.

slide-125
SLIDE 125

Questions?

nicholas@carlini.com https://nicholas.carlini.com
slide-126
SLIDE 126

Questions?

nicholas@carlini.com https://nicholas.carlini.com
slide-127
SLIDE 127

Questions?

nicholas@carlini.com https://nicholas.carlini.com