15-780 Graduate Artificial Intelligence: Adversarial attacks and - - PowerPoint PPT Presentation

15 780 graduate artificial intelligence adversarial
SMART_READER_LITE
LIVE PREVIEW

15-780 Graduate Artificial Intelligence: Adversarial attacks and - - PowerPoint PPT Presentation

15-780 Graduate Artificial Intelligence: Adversarial attacks and provable defenses J. Zico Kolter (this lecture) and Ariel Procaccia Carnegie Mellon University Spring 2018 Portions base upon joint work with Eric Wong 1 Outline Adverarial


slide-1
SLIDE 1

15-780 – Graduate Artificial Intelligence: Adversarial attacks and provable defenses

  • J. Zico Kolter (this lecture) and Ariel Procaccia

Carnegie Mellon University Spring 2018 Portions base upon joint work with Eric Wong

1

slide-2
SLIDE 2

Outline

Adverarial attacks on machine learning Robust optimization Provable defenses for deep classifiers Experimental results

2

slide-3
SLIDE 3

Outline

Adverarial attacks on machine learning Robust optimization Provable defenses for deep classifiers Experimental results

3

slide-4
SLIDE 4

Adversarial attacks

4

+ .007 ⇥ = x sign(rxJ(θ, x, y)) x + ✏sign(rxJ(θ, x, y)) “panda” “nematode” “gibbon” 57.7% confidence 8.2% confidence 99.3 % confidence [Szegedy et al., 2014, Goodfellow et al., 2015]

slide-5
SLIDE 5

How adversarial attacks work

We are focusing on test time attacks: train on clean data and attackers tries to fool the trained classifier at test time To keep things tractable, we are going to restrict our attention to ℓ∞ norm bounded attacks: the adversary is free to manipulate inputs within some ℓ∞ ball around the true example 𝑦̃ = 𝑦 + Δ, Δ ∞ ≤ 𝜗 Basic method: given input 𝑦 ∈ 𝒴, output 𝑧 ∈ 𝒵, hypothesis ℎ휃: 𝒴 → 𝒵, and loss function ℓ: 𝒵×𝒵 → ℝ+, adjust 𝑦 to maximum loss: maximize

∆ ∞≤휖

ℓ(ℎ휃 𝑦 + Δ , 𝑧) Other variants we will see shortly (e.g., maximizing specific target class)

5

slide-6
SLIDE 6

A summary of adversarial example research

🙃 Distillation prevents adversarial attacks! [Papernot et al., 2016] 🙂 No it doesn’t! [Carlini and Wagner, 2017] 🙃 No need to worry given translation/rotation! [Lu et al., 2017] 🙂 Yes there is! [Athalye and Sutskever, 2017] 🙃 We have 9 new defenses you can use! [ICLR 2018 papers] 🙂 Broken before review period had finished! [Athalye et al., 2018] My view: the attackers are winning, we need to get out of this arms race

6

slide-7
SLIDE 7

A slightly better summary

Many heuristic methods for defending against against adversarial examples [e.g., Goodfellow et al., 2015; Papernot et al., 2016; Madry et al., 2017; Tramér et al., 2017; Roy et al., 2017]

  • Keep getting broken, unclear if/when we’ll find the right heuristic

Formal methods approaches to verifying networks via tools from SMT, integer programming, SAT solving, etc. [e.g., Carlini et al., 2017; Ehlers 2017; Katz et al., 2017; Huang et al., 2017]

  • Limited to small networks by combinatorial optimization

Our work: Tractable, provable defenses against adversarial examples via convex relaxations [also related: Raghunathan et al., 2018; Staib and Jegelka 2017; Sinha et al., 2017; Hein and Andriushchenko 2017; Peck et al, 2017]

7

slide-8
SLIDE 8

Adversarial examples in the real world

8

Sharif et al., 2016 Evtimov et al., 2017 Athalye et al., 2017 Note: only the last one here is possibly an ℓ∞ perturbation

slide-9
SLIDE 9

The million dollar question

How can we design (deep) classifiers that are provably robust to adversarial attacks?

9

slide-10
SLIDE 10

Outline

Adverarial attacks on machine learning Robust optimization Provable defenses for deep classifiers Experimental results

10

slide-11
SLIDE 11

Robust optimization

A area of optimization that goes almost 50 years [Soyster, 1973; see Ben- Tal et al., 2011] Robust optimization (as applied to machine learning): instead of minimizing loss at training points, minimize worst case loss in some ball around the points

11

minimize

∑ ℓ(ℎ휃 𝑦푖 ⋅ 𝑧푖)

minimize

∑ max

∆ ∞≤휖 ℓ(ℎ휃 𝑦푖 + Δ ⋅ 𝑧푖)

≡ minimize

∑ ℓ(ℎ휃 𝑦푖 ⋅ 𝑧푖 − 𝜗 𝜄 1)

(for linear classifiers)

slide-12
SLIDE 12

Proof of robust machine learning property

Lemma: For linear hypothesis function ℎ휃 𝑦 = 𝜄푇 𝑦, binary output 𝑧 ∈ {−1, +1}, and classification loss ℓ ℎ휃 𝑦 ⋅ 𝑧 max

∆ ∞≤휖 ℓ(ℎ휃 𝑦 + Δ ⋅ 𝑧) = ℓ ℎ휃 𝑦 ⋅ 𝑧 − 𝜗 𝜄 1

Proof: Because classification loss is monotonic decreasing max

∆ ∞≤휖 ℓ(ℎ휃 𝑦 + Δ ⋅ 𝑧) = ℓ

min

∆ ∞≤휖 ℎ휃(𝑦 + Δ) ⋅ 𝑧

= ℓ min

∆ ∞≤휖 𝜄푇 𝑦 + Δ ⋅ 𝑧

Theorem follows from the fact that min

∆ ∞≤휖 𝜄푇 Δ = −𝜗 𝜄 1

12

slide-13
SLIDE 13

What to do at test time?

This procedure prevents the possibility of adversarial examples at training time, but what about at test time? Basic idea: If we make a prediction at a point, and this prediction does not change within the ℓ∞ ball of 𝜗 around the point, then this cannot be an adversarial example (i.e., we have a zero-false negative detector)

13

slide-14
SLIDE 14

Outline

Adverarial attacks on machine learning Robust optimization Provable defenses for deep classifiers Experimental results

14

Based upon work in: Wong and Kolter, “Provable defenses against adversarial examples via the convex adversarial polytope”, 2017 https://arxiv.org/abs/1711.00851

slide-15
SLIDE 15

The trouble with deep networks

In deep networks, the “image” (adversarial polytope) of a norm bounded perturbation is non-convex, we can’t easily optimize over it Our approach: instead, form convex outer bound over the adversarial polytope, and perform robust optimization over this region (applies specifically to networks with ReLU nonlinearities)

15

Deep network Deep network

slide-16
SLIDE 16

Convex outer approximations

Optimization over convex outer adversarial polytope provides guarantees about robustness to adversarial perturbations … so, how do we compute and optimize over this bound?

16

slide-17
SLIDE 17

Adversarial examples as optimization

Finding the worst-case adversarial perturbation (within true adversarial polytope), can be written as a non-convex problem

17

minimize

푧,푧̂

(𝑨푘)푦⋆−(𝑨푘)푦target subject to 𝑨1 − 𝑦 ∞ ≤ 𝜗 𝑨̂

푖+1 = 𝑋푖𝑨푖 + 𝑐푖,

𝑗 = 1, … , 𝑙 − 1 𝑨푖 = max{𝑨̂

푖 , 0},

𝑗 = 2, … , 𝑙 − 1

slide-18
SLIDE 18

Adversarial examples as optimization

Finding the worst-case adversarial perturbation (within true adversarial polytope), can be written as a non-convex problem

18

minimize

푧,푧̂

(𝑨푘)푦⋆−(𝑨푘)푦target subject to 𝑨1 − 𝑦 ∞ ≤ 𝜗 𝑨̂

푖+1 = 𝑋푖𝑨푖 + 𝑐푖,

𝑗 = 1, … , 𝑙 − 1 𝑨푖 = max{𝑨̂

푖 , 0},

𝑗 = 2, … , 𝑙 − 1

slide-19
SLIDE 19

Adversarial examples as optimization

Finding the worst-case adversarial perturbation (within true adversarial polytope), can be written as a non-convex problem

19

minimize

푧,푧̂

(𝑨푘)푦⋆−(𝑨푘)푦target subject to 𝑨1 − 𝑦 ≤ 𝜗 𝑨1 − 𝑦 ≥ −𝜗 𝑨̂

푖+1 = 𝑋푖𝑨푖 + 𝑐푖,

𝑗 = 1, … , 𝑙 − 1 𝑨푖 = max{𝑨̂

푖 , 0},

𝑗 = 2, … , 𝑙 − 1

slide-20
SLIDE 20

Adversarial examples as optimization

Finding the worst-case adversarial perturbation (within true adversarial polytope), can be written as a non-convex problem

20

minimize

푧,푧̂

(𝑨푘)푦⋆−(𝑨푘)푦target subject to 𝑨1 − 𝑦 ≤ 𝜗 𝑨1 − 𝑦 ≥ −𝜗 𝑨̂

푖+1 = 𝑋푖𝑨푖 + 𝑐푖,

𝑗 = 1, … , 𝑙 − 1 𝑨푖 = max{𝑨̂

푖 , 0},

𝑗 = 2, … , 𝑙 − 1

slide-21
SLIDE 21

Idea #1: Convex bounds on ReLU nonlinearities

Suppose we have some upper and lower bound ℓ, 𝑣 on the values that a particular (pre-ReLU) activation can take on, for this particular example 𝑦 Then we can relax the ReLU “constraint” to its convex hull

21

minimize

푧,푧̂

(𝑨푘)푦⋆−(𝑨푘)푦target subject to 𝑨1 − 𝑦 ≤ 𝜗 𝑨1 − 𝑦 ≥ −𝜗 𝑨̂

푖+1 = 𝑋푖𝑨푖 + 𝑐푖,

𝑗 = 1, … , 𝑙 − 1 𝑨푖 = max{𝑨̂

푖 , 0},

𝑗 = 2, … , 𝑙 − 1

ℓ u ℓ u

Bounded ReLU set Convex relaxation

ˆ z z ˆ z z

slide-22
SLIDE 22

Idea #1: Convex bounds on ReLU nonlinearities

Suppose we have some upper and lower bound ℓ, 𝑣 on the values that a particular (pre-ReLU) activation can take on, for this particular example 𝑦 Then we can relax the ReLU “constraint” to its convex hull

22

ℓ u ℓ u

Bounded ReLU set Convex relaxation

ˆ z z ˆ z z

minimize

푧,푧̂

(𝑨푘)푦⋆−(𝑨푘)푦target subject to 𝑨1 − 𝑦 ≤ 𝜗 𝑨1 − 𝑦 ≥ −𝜗 𝑨̂

푖+1 = 𝑋푖𝑨푖 + 𝑐푖,

𝑗 = 1, … , 𝑙 − 1 (𝑨̂

푖, 𝑨푖) ∈ 𝒟 ℓ푖, 𝑣푖 ,

𝑗 = 2, … , 𝑙 − 1

A linear program!

slide-23
SLIDE 23

Idea #2: Exploiting duality

While the previous formulation is nice, it would require solving an LP (with the number of variables equal to the number of hidden units in network),

  • nce for each example, for each SGD step
  • (This even ignores how to compute upper and lower bounds ℓ, 𝑣)

We’re going to use the “duality trick”, the fact that any feasible dual solution gives a lower bound on LP solution

23

True adversarial polytope Convex outer bound (from ReLU convex hull) True adversarial polytope Convex outer bound (from ReLU convex hull) Bound from dual feasible solution

slide-24
SLIDE 24

An amazing property

It turns out that we can compute a (empirically, close to optimal) dual feasible solution using a single backward pass through the network (really a slightly augmented form of the backprop network)

24

minimize

𝑑푇 𝑨푘 subject to 𝑨1 − 𝑦 ∞ ≤ 𝜗 (𝑨푖+1, 𝑋푖𝑨푖 + 𝑐푖) ∈ 𝒟(ℓ푖, 𝑣푖) maximize

휈,훼

𝐾휖,푊,푏 𝜉, 𝑦 ≡ − ∑ 𝜉푖+1

𝑐푖 − 𝑦푇 𝜉̂1 − 𝜗 𝜉̂1 1 + ∑ ∑ ℓ푖,푗 𝜉푖,푗 +

  • 푗∈ℐ푖

푘−1 푖=1 푘−1 푖=1

subject to 𝜉푘 = −𝑑 𝜉̂푖 = 𝑋 푇 𝜉푖+1, 𝑗 = 𝑙 − 1, … , 1 𝜉푖 = 𝑔푖 𝜉̂푖, 𝛽푖; ℓ푖, 𝑣푖 , 𝑗 = 𝑙 − 1, … , 2

slide-25
SLIDE 25

An amazing property

It turns out that we can compute a (empirically, close to optimal) dual feasible solution using a single backward pass through the network (really a slightly augmented form of the backprop network)

25

minimize

𝑑푇 𝑨푘 subject to 𝑨1 − 𝑦 ∞ ≤ 𝜗 (𝑨푖+1, 𝑋푖𝑨푖 + 𝑐푖) ∈ 𝒟(ℓ푖, 𝑣푖) maximize

휈,훼

𝐾휖,푊,푏 𝜉, 𝑦 ≡ − ∑ 𝜉푖+1

𝑐푖 − 𝑦푇 𝜉̂1 − 𝜗 𝜉̂1 1 + ∑ ∑ ℓ푖,푗 𝜉푖,푗 +

  • 푗∈ℐ푖

푘−1 푖=1 푘−1 푖=1

subject to 𝜉푘 = −𝑑 𝜉̂푖 = 𝑋 푇 𝜉푖+1, 𝑗 = 𝑙 − 1, … , 1 𝜉푖 = 𝑔푖 𝜉̂푖, 𝛽푖; ℓ푖, 𝑣푖 , 𝑗 = 𝑙 − 1, … , 2

Set of all activations in layer 𝑗 that can cross zero

slide-26
SLIDE 26

An amazing property

It turns out that we can compute a (empirically, close to optimal) dual feasible solution using a single backward pass through the network (really a slightly augmented form of the backprop network)

26

minimize

𝑑푇 𝑨푘 subject to 𝑨1 − 𝑦 ∞ ≤ 𝜗 (𝑨푖+1, 𝑋푖𝑨푖 + 𝑐푖) ∈ 𝒟(ℓ푖, 𝑣푖) maximize

휈,훼

𝐾휖,푊,푏 𝜉, 𝑦 ≡ − ∑ 𝜉푖+1

𝑐푖 − 𝑦푇 𝜉̂1 − 𝜗 𝜉̂1 1 + ∑ ∑ ℓ푖,푗 𝜉푖,푗 +

  • 푗∈ℐ푖

푘−1 푖=1 푘−1 푖=1

subject to 𝜉푘 = −𝑑 𝜉̂푖 = 𝑋 푇 𝜉푖+1, 𝑗 = 𝑙 − 1, … , 1 𝜉푖 = 𝑔푖 𝜉̂푖, 𝛽푖; ℓ푖, 𝑣푖 , 𝑗 = 𝑙 − 1, … , 2

Derivative of ReLU with slightly modification on ℐ푖

slide-27
SLIDE 27

An amazing property

It turns out that we can compute a (empirically, close to optimal) dual feasible solution using a single backward pass through the network (really a slightly augmented form of the backprop network)

27

minimize

𝑑푇 𝑨푘 subject to 𝑨1 − 𝑦 ∞ ≤ 𝜗 (𝑨푖+1, 𝑋푖𝑨푖 + 𝑐푖) ∈ 𝒟(ℓ푖, 𝑣푖) maximize

휈,훼

𝐾휖,푊,푏 𝜉, 𝑦 ≡ − ∑ 𝜉푖+1

𝑐푖 − 𝑦푇 𝜉̂1 − 𝜗 𝜉̂1 1 + ∑ ∑ ℓ푖,푗 𝜉푖,푗 +

  • 푗∈ℐ푖

푘−1 푖=1 푘−1 푖=1

subject to 𝜉푘 = −𝑑 𝜉̂푖 = 𝑋 푇 𝜉푖+1, 𝑗 = 𝑙 − 1, … , 1 𝜉푖 = 𝑔푖 𝜉̂푖, 𝛽푖; ℓ푖, 𝑣푖 , 𝑗 = 𝑙 − 1, … , 2

Almost identical to backprop network

slide-28
SLIDE 28

An amazing property

It turns out that we can compute a (empirically, close to optimal) dual feasible solution using a single backward pass through the network (really a slightly augmented form of the backprop network)

28

minimize

𝑑푇 𝑨푘 subject to 𝑨1 − 𝑦 ∞ ≤ 𝜗 (𝑨푖+1, 𝑋푖𝑨푖 + 𝑐푖) ∈ 𝒟(ℓ푖, 𝑣푖) maximize

휈,훼

𝐾휖,푊,푏 𝜉, 𝑦 ≡ − ∑ 𝜉푖+1

𝑐푖 − 𝑦푇 𝜉̂1 − 𝜗 𝜉̂1 1 + ∑ ∑ ℓ푖,푗 𝜉푖,푗 +

  • 푗∈ℐ푖

푘−1 푖=1 푘−1 푖=1

subject to 𝜉푘 = −𝑑 𝜉̂푖 = 𝑋 푇 𝜉푖+1, 𝑗 = 𝑙 − 1, … , 1 𝜉푖 = 𝑔푖 𝜉̂푖, 𝛽푖; ℓ푖, 𝑣푖 , 𝑗 = 𝑙 − 1, … , 2

Objective at 𝜗 = 0

slide-29
SLIDE 29

An amazing property

It turns out that we can compute a (empirically, close to optimal) dual feasible solution using a single backward pass through the network (really a slightly augmented form of the backprop network)

29

minimize

𝑑푇 𝑨푘 subject to 𝑨1 − 𝑦 ∞ ≤ 𝜗 (𝑨푖+1, 𝑋푖𝑨푖 + 𝑐푖) ∈ 𝒟(ℓ푖, 𝑣푖) maximize

휈,훼

𝐾휖,푊,푏 𝜉, 𝑦 ≡ − ∑ 𝜉푖+1

𝑐푖 − 𝑦푇 𝜉̂1 − 𝜗 𝜉̂1 1 + ∑ ∑ ℓ푖,푗 𝜉푖,푗 +

  • 푗∈ℐ푖

푘−1 푖=1 푘−1 푖=1

subject to 𝜉푘 = −𝑑 𝜉̂푖 = 𝑋 푇 𝜉푖+1, 𝑗 = 𝑙 − 1, … , 1 𝜉푖 = 𝑔푖 𝜉̂푖, 𝛽푖; ℓ푖, 𝑣푖 , 𝑗 = 𝑙 − 1, … , 2

Robustness penalty (same form as in linear case)

slide-30
SLIDE 30

An amazing property

It turns out that we can compute a (empirically, close to optimal) dual feasible solution using a single backward pass through the network (really a slightly augmented form of the backprop network)

30

minimize

𝑑푇 𝑨푘 subject to 𝑨1 − 𝑦 ∞ ≤ 𝜗 (𝑨푖+1, 𝑋푖𝑨푖 + 𝑐푖) ∈ 𝒟(ℓ푖, 𝑣푖) maximize

휈,훼

𝐾휖,푊,푏 𝜉, 𝑦 ≡ − ∑ 𝜉푖+1

𝑐푖 − 𝑦푇 𝜉̂1 − 𝜗 𝜉̂1 1 + ∑ ∑ ℓ푖,푗 𝜉푖,푗 +

  • 푗∈ℐ푖

푘−1 푖=1 푘−1 푖=1

subject to 𝜉푘 = −𝑑 𝜉̂푖 = 𝑋 푇 𝜉푖+1, 𝑗 = 𝑙 − 1, … , 1 𝜉푖 = 𝑔푖 𝜉̂푖, 𝛽푖; ℓ푖, 𝑣푖 , 𝑗 = 𝑙 − 1, … , 2

Additional penalty for violating ReLU constraint

slide-31
SLIDE 31

Idea #3: Iterative lower and upper bounds

A meaningful bound requires good lower and upper bounds ℓ푖, 𝑣푖 Incrementally build bounds by solving LP for each activation Need some tricks to make this efficient: use same (particular) 𝛽 for dual problems, compute multiplications in the right order in objective

31

ℓ1, 𝑣1 ℓ2, 𝑣2 ℓ3, 𝑣3 𝑦 𝑨푘

slide-32
SLIDE 32

Putting it all together

In the end, instead of minimizing the traditional loss… minimize

∑ ℓ(ℎ휃 𝑦 푖 , 𝑧 푖 )

푚 푖=1

…we just minimize a loss with a different network, involving a few forward and backward passes, and we get a guaranteed bound on worst-case loss (or error) for any norm-bounded adversarial attack minimize

∑ ℓ(𝐾휖,휃 𝑦 푖 , 𝑧 푖 )

푚 푖=1

At test time, evaluate the bound to see if example is possibly adversarial (zero false negatives, but may incorrectly flag some benign examples)

32

slide-33
SLIDE 33

Outline

Adverarial attacks on machine learning Robust optimization Provable defenses for deep classifiers Experimental results

33

slide-34
SLIDE 34

2D Toy Example

Simple 2D toy problem, 2-100-100-100-2 MLP network, trained with Adam (learning rate = 0.001, no real hyperparameter tuning)

34

Standard training Robust convex training

slide-35
SLIDE 35

MNIST

Strided ConvNet (Conv16x4x4, Conv32x4x4, FC100, FC10) ReLUs following each layer, convolutions have stride=2

35 1.10% 17% 1.80% 5% 100% 44% 5.80% 35% 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Standard deep network Robust linear classifier Our method Ragunathan et al., 2018

Standard and robust errors on MNIST

Error Robust error bound

slide-36
SLIDE 36

MNIST Attacks

We can also look at how well real attacks perform at 𝜗 = 0.1

36 1.10% 1.80% 50% 3.90% 82% 4.10% 100% 5.80% 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Standard training Our method

MNIST Attacks

No attack FGSM PGD Robust bound

slide-37
SLIDE 37

Convergence

37

Training does take substantially longer (2 hours), and requires more epochs than standard training Method does largely avoid overfitting (adversarial robustness is a powerful regularizer), so we want to consider larger architectures

slide-38
SLIDE 38

Results on additional tasks

Promising performance, but lots more work remains (right now, performance is limited by the size of architectures we can run), current work involves scaling to larger problems via random projections, bottleneck layers, and other techniques

38

slide-39
SLIDE 39

Some take away messages

The work on adversarial defenses, up until now, has been extremely ad- hoc, defenses again some hypothesized attack, but not all attacks Combining techniques from this class: convex optimization, linear programming, duality, with deep networks, is a largely unexplored and hugely fruitful area Many open questions and practical challenges remain, but I think we are starting to be on the right course

39