15-780 Graduate Artificial Intelligence: Adversarial attacks and - PowerPoint PPT Presentation

15-780 – Graduate Artificial Intelligence: Adversarial attacks and provable defenses J. Zico Kolter (this lecture) and Ariel Procaccia Carnegie Mellon University Spring 2018 Portions base upon joint work with Eric Wong 1

Outline Adverarial attacks on machine learning Robust optimization Provable defenses for deep classifiers Experimental results 2

Adversarial attacks + . 007 ⇥ = x + sign ( r x J ( θ , x , y )) x ✏ sign ( r x J ( θ , x , y )) “panda” “nematode” “gibbon” 57.7% confidence 8.2% confidence 99.3 % confidence [Szegedy et al., 2014, Goodfellow et al., 2015] 4

How adversarial attacks work We are focusing on test time attacks: train on clean data and attackers tries to fool the trained classifier at test time To keep things tractable, we are going to restrict our attention to ℓ ∞ norm bounded attacks: the adversary is free to manipulate inputs within some ℓ ∞ ball around the true example 𝑦̃ = 𝑦 + Δ , Δ ∞ ≤ 𝜗 Basic method: given input 𝑦 ∈ 𝒴 , output 𝑧 ∈ 𝒵 , hypothesis ℎ 휃 : 𝒴 → 𝒵 , and loss function ℓ : 𝒵×𝒵 → ℝ + , adjust 𝑦 to maximum loss: maximize ℓ ( ℎ 휃 𝑦 + Δ , 𝑧 ) ∆ ∞ ≤휖 Other variants we will see shortly (e.g., maximizing specific target class) 5

A summary of adversarial example research 🙃 Distillation prevents adversarial attacks! [Papernot et al., 2016] 🙂 No it doesn’t! [Carlini and Wagner, 2017] 🙃 No need to worry given translation/rotation! [Lu et al., 2017] 🙂 Yes there is! [Athalye and Sutskever, 2017] 🙃 We have 9 new defenses you can use! [ICLR 2018 papers] 🙂 Broken before review period had finished! [Athalye et al., 2018] My view: the attackers are winning, we need to get out of this arms race 6

A slightly better summary Many heuristic methods for defending against against adversarial examples [e.g., Goodfellow et al., 2015; Papernot et al., 2016; Madry et al., 2017; Tramér et al., 2017; Roy et al., 2017] • Keep getting broken, unclear if/when we’ll find the right heuristic Formal methods approaches to verifying networks via tools from SMT, integer programming, SAT solving, etc. [e.g., Carlini et al., 2017; Ehlers 2017; Katz et al., 2017; Huang et al., 2017] • Limited to small networks by combinatorial optimization Our work: Tractable, provable defenses against adversarial examples via convex relaxations [also related: Raghunathan et al., 2018; Staib and Jegelka 2017; Sinha et al., 2017; Hein and Andriushchenko 2017; Peck et al, 2017] 7

Adversarial examples in the real world Evtimov et al., 2017 Sharif et al., 2016 Athalye et al., 2017 Note: only the last one here is possibly an ℓ ∞ perturbation 8

The million dollar question How can we design (deep) classifiers that are provably robust to adversarial attacks? 9

Robust optimization A area of optimization that goes almost 50 years [Soyster, 1973; see Ben- Tal et al., 2011] Robust optimization (as applied to machine learning): instead of minimizing loss at training points, minimize worst case loss in some ball around the points � � minimize minimize ∑ ℓ ( ℎ 휃 𝑦 푖 ⋅ 𝑧 푖 ) ∑ max ∆ ∞ ≤휖 ℓ ( ℎ 휃 𝑦 푖 + Δ ⋅ 𝑧 푖 ) 휃 휃 푖 푖 � ≡ minimize ∑ ℓ ( ℎ 휃 𝑦 푖 ⋅ 𝑧 푖 − 𝜗 𝜄 1 ) 휃 푖 (for linear classifiers) 11

Proof of robust machine learning property Lemma: For linear hypothesis function ℎ 휃 𝑦 = 𝜄 푇 𝑦 , binary output 𝑧 ∈ { − 1, +1} , and classification loss ℓ ℎ 휃 𝑦 ⋅ 𝑧 ∆ ∞ ≤휖 ℓ ( ℎ 휃 𝑦 + Δ ⋅ 𝑧 ) = ℓ ℎ 휃 𝑦 ⋅ 𝑧 − 𝜗 𝜄 1 max Proof: Because classification loss is monotonic decreasing ∆ ∞ ≤휖 ℓ ( ℎ 휃 𝑦 + Δ ⋅ 𝑧 ) = ℓ max ∆ ∞ ≤휖 ℎ 휃 ( 𝑦 + Δ ) ⋅ 𝑧 min ∆ ∞ ≤휖 𝜄 푇 𝑦 + Δ ⋅ 𝑧 = ℓ min Theorem follows from the fact that ∆ ∞ ≤휖 𝜄 푇 Δ = −𝜗 𝜄 1 min ∎ 12

What to do at test time? This procedure prevents the possibility of adversarial examples at training time, but what about at test time? Basic idea: If we make a prediction at a point, and this prediction does not change within the ℓ ∞ ball of 𝜗 around the point, then this cannot be an adversarial example (i.e., we have a zero-false negative detector ) 13

Outline Adverarial attacks on machine learning Robust optimization Provable defenses for deep classifiers Experimental results Based upon work in: Wong and Kolter, “Provable defenses against adversarial examples via the convex adversarial polytope”, 2017 https://arxiv.org/abs/1711.00851 14

The trouble with deep networks In deep networks, the “image” (adversarial polytope) of a norm bounded perturbation is non-convex, we can’t easily optimize over it Deep network Deep network Our approach: instead, form convex outer bound over the adversarial polytope, and perform robust optimization over this region (applies specifically to networks with ReLU nonlinearities) 15

Convex outer approximations Optimization over convex outer adversarial polytope provides guarantees about robustness to adversarial perturbations … so, how do we compute and optimize over this bound? 16

Adversarial examples as optimization Finding the worst-case adversarial perturbation (within true adversarial polytope), can be written as a non-convex problem minimize ( 𝑨 푘 ) 푦 ⋆ − ( 𝑨 푘 ) 푦 target 푧,푧̂ subject to 𝑨 1 − 𝑦 ∞ ≤ 𝜗 𝑨̂ 푖 + 1 = 𝑋 푖 𝑨 푖 + 𝑐 푖 , 𝑗 = 1, … , 𝑙 − 1 𝑨 푖 = max{ 𝑨̂ 푖 , 0}, 𝑗 = 2, … , 𝑙 − 1 17

Adversarial examples as optimization Finding the worst-case adversarial perturbation (within true adversarial polytope), can be written as a non-convex problem minimize ( 𝑨 푘 ) 푦 ⋆ − ( 𝑨 푘 ) 푦 target 푧,푧̂ subject to 𝑨 1 − 𝑦 ∞ ≤ 𝜗 𝑨̂ 푖 + 1 = 𝑋 푖 𝑨 푖 + 𝑐 푖 , 𝑗 = 1, … , 𝑙 − 1 𝑨 푖 = max{ 𝑨̂ 푖 , 0}, 𝑗 = 2, … , 𝑙 − 1 18

Adversarial examples as optimization Finding the worst-case adversarial perturbation (within true adversarial polytope), can be written as a non-convex problem minimize ( 𝑨 푘 ) 푦 ⋆ − ( 𝑨 푘 ) 푦 target 푧,푧̂ subject to 𝑨 1 − 𝑦 ≤ 𝜗 𝑨 1 − 𝑦 ≥ −𝜗 𝑨̂ 푖 + 1 = 𝑋 푖 𝑨 푖 + 𝑐 푖 , 𝑗 = 1, … , 𝑙 − 1 𝑨 푖 = max{ 𝑨̂ 푖 , 0}, 𝑗 = 2, … , 𝑙 − 1 19

Adversarial examples as optimization Finding the worst-case adversarial perturbation (within true adversarial polytope), can be written as a non-convex problem minimize ( 𝑨 푘 ) 푦 ⋆ − ( 𝑨 푘 ) 푦 target 푧,푧̂ subject to 𝑨 1 − 𝑦 ≤ 𝜗 𝑨 1 − 𝑦 ≥ −𝜗 𝑨̂ 푖 + 1 = 𝑋 푖 𝑨 푖 + 𝑐 푖 , 𝑗 = 1, … , 𝑙 − 1 𝑨 푖 = max{ 𝑨̂ 푖 , 0}, 𝑗 = 2, … , 𝑙 − 1 20

Idea #1: Convex bounds on ReLU nonlinearities z z ˆ ˆ z z ℓ ℓ u u Bounded ReLU set Convex relaxation Suppose we have some upper and lower bound ℓ , 𝑣 on the values that a particular (pre-ReLU) activation can take on, for this particular example 𝑦 Then we can relax the ReLU “constraint” to its convex hull minimize ( 𝑨 푘 ) 푦 ⋆ − ( 𝑨 푘 ) 푦 target 푧,푧̂ subject to 𝑨 1 − 𝑦 ≤ 𝜗 𝑨 1 − 𝑦 ≥ −𝜗 𝑨̂ 푖 + 1 = 𝑋 푖 𝑨 푖 + 𝑐 푖 , 𝑗 = 1, … , 𝑙 − 1 𝑨 푖 = max{ 𝑨̂ 푖 , 0}, 𝑗 = 2, … , 𝑙 − 1 21

Idea #1: Convex bounds on ReLU nonlinearities z z ˆ ˆ z z ℓ ℓ u u Bounded ReLU set Convex relaxation Suppose we have some upper and lower bound ℓ , 𝑣 on the values that a particular (pre-ReLU) activation can take on, for this particular example 𝑦 Then we can relax the ReLU “constraint” to its convex hull minimize ( 𝑨 푘 ) 푦 ⋆ − ( 𝑨 푘 ) 푦 target 푧,푧̂ subject to 𝑨 1 − 𝑦 ≤ 𝜗 A linear program! 𝑨 1 − 𝑦 ≥ −𝜗 𝑨̂ 푖 + 1 = 𝑋 푖 𝑨 푖 + 𝑐 푖 , 𝑗 = 1, … , 𝑙 − 1 ( 𝑨̂ 푖 , 𝑨 푖 ) ∈ 𝒟 ℓ 푖 , 𝑣 푖 , 𝑗 = 2, … , 𝑙 − 1 22

Idea #2: Exploiting duality While the previous formulation is nice, it would require solving an LP (with the number of variables equal to the number of hidden units in network), once for each example, for each SGD step • (This even ignores how to compute upper and lower bounds ℓ , 𝑣 ) We’re going to use the “duality trick”, the fact that any feasible dual solution gives a lower bound on LP solution True adversarial polytope True adversarial polytope Convex outer bound (from ReLU convex hull) Convex outer bound (from ReLU convex hull) Bound from dual feasible solution 23

15-780 Graduate Artificial Intelligence: Adversarial attacks and - PowerPoint PPT Presentation

15-780 Graduate Artificial Intelligence: Adversarial attacks and provable defenses J. Zico Kolter (this lecture) and Ariel Procaccia Carnegie Mellon University Spring 2018 Portions base upon joint work with Eric Wong 1 Outline Adverarial

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

15-780 - graduate artificial intelligence ai and education i . Shayan Doroudi April 24, 2017 1

15-780 Graduate Artificial Intelligence: Probabilistic modeling J. Zico Kolter (this lecture)

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

15-780 Graduate Artificial Intelligence: Integer programming J. Zico Kolter (this lecture)

15-780 Graduate Artificial Intelligence: Integer programming J. Zico Kolter (this lecture)

15-780 Graduate Artificial Intelligence: Machine learning J. Zico Kolter (this lecture) and

15-780 Graduate Artificial Intelligence: Probabilistic inference J. Zico Kolter (this

15-780 Graduate Artificial Intelligence: Optimization J. Zico Kolter (this lecture) and Ariel

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

15-780 - graduate artificial intelligence ai and education iii . Shayan Doroudi May 1, 2017 1

CHAPTERS 45: NON-CLASSICAL AND CHAPTERS 45: NON-CLASSICAL AND ADVERSARIAL SEARCH

Differentiable Abstract Interpretation for Provably Robust Neural Networks safeai.ethz.ch

Distributed Denial of Service Attacks & Defenses Guest Lecture by: Vamsi Kambhampati Fall

On the Feasibility of Rerouting-based DDoS Defenses Muoi Tran , Min Suk Kang, Hsu-Chun Hsiao,

Fragmentation Considered Vulnerable Yossi Gilad & Amir Herzberg Computer Science Department,

Adversarial Machine Learning (AML) Somesh Jha University of Wisconsin, Madison Thanks to

The Federal Payroll Tax Case (Focus on Trust-Fund Recovery Penalty) S T E P H E N P . K A U F F

Defending against malicious peripherals with Cinch Presented by Avesta Hojjati CS598 Computer

Outline Return address protections CSci 5271 Introduction to Computer Security Announcements

15-780 Graduate Artificial Intelligence: Adversarial attacks and - PowerPoint PPT Presentation

15-780 Graduate Artificial Intelligence: Adversarial attacks and provable defenses J. Zico Kolter (this lecture) and Ariel Procaccia Carnegie Mellon University Spring 2018 Portions base upon joint work with Eric Wong 1 Outline Adverarial

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

15-780 - graduate artificial intelligence ai and education i . Shayan Doroudi April 24, 2017 1

15-780 Graduate Artificial Intelligence: Probabilistic modeling J. Zico Kolter (this lecture)

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

15-780 Graduate Artificial Intelligence: Integer programming J. Zico Kolter (this lecture)

15-780 Graduate Artificial Intelligence: Integer programming J. Zico Kolter (this lecture)

15-780 Graduate Artificial Intelligence: Machine learning J. Zico Kolter (this lecture) and

15-780 Graduate Artificial Intelligence: Probabilistic inference J. Zico Kolter (this

15-780 Graduate Artificial Intelligence: Optimization J. Zico Kolter (this lecture) and Ariel

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

15-780 - graduate artificial intelligence ai and education iii . Shayan Doroudi May 1, 2017 1

CHAPTERS 45: NON-CLASSICAL AND CHAPTERS 45: NON-CLASSICAL AND ADVERSARIAL SEARCH

Differentiable Abstract Interpretation for Provably Robust Neural Networks safeai.ethz.ch

Distributed Denial of Service Attacks &amp; Defenses Guest Lecture by: Vamsi Kambhampati Fall

On the Feasibility of Rerouting-based DDoS Defenses Muoi Tran , Min Suk Kang, Hsu-Chun Hsiao,

Fragmentation Considered Vulnerable Yossi Gilad &amp; Amir Herzberg Computer Science Department,

Adversarial Machine Learning (AML) Somesh Jha University of Wisconsin, Madison Thanks to

The Federal Payroll Tax Case (Focus on Trust-Fund Recovery Penalty) S T E P H E N P . K A U F F

Defending against malicious peripherals with Cinch Presented by Avesta Hojjati CS598 Computer

Outline Return address protections CSci 5271 Introduction to Computer Security Announcements

Distributed Denial of Service Attacks & Defenses Guest Lecture by: Vamsi Kambhampati Fall

Fragmentation Considered Vulnerable Yossi Gilad & Amir Herzberg Computer Science Department,