Adversarial Attacks and Defenses in Deep Learning Hang Su - PowerPoint PPT Presentation

Adversarial Attacks and Defenses in Deep Learning Hang Su suhangss@tsinghua.edu.cn Institute for Artificial Intelligence Dept. of Computer Science & Technology Tsinghua University 1

Background ⚫ Artificial intelligence (AI) is a transformative technology that holds promise for tremendous societal and economic benefit, which has made dramatic success in a torrent of applications ⚫ AI has the potential to revolutionize how we live, work, learn, discover, and communicate. 2

AI is NOT Trustworthy ⚫ AI — The Revolution Hasn’t Happened Yet. ---Michael Jordan ⚫ The effectiveness of AI algorithms will be limited by the machine’s inability to explain its decisions and actions to human users . ⚫ Several machine learning models, including neural networks, consistently misclassify adversarial examples Alps: 94.39% Dog: 99.99% Puffer: 97.99% Crab: 100.00% 3

Content ⚫ Understandable: traceability, explainability and communication ⚫ Adversarial Robust: resilience to attack and security (Adversarial) Robust Trustworthy AI Understandable 4

Robustness ⚫ A crucial component of achieving Trustworthy AI is technical robustness ⚫ Technical robustness requires that AI systems be developed with a preventative approach to risks Alps: 94.39% Dog: 99.99% Puffer: 97.99% Crab: 100.00% • YP Dong et al., Boosting Adversarial Attacks with Momentum, In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018. • Fangzhou Liao et al. Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser, In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018. 5

The world can be adversarial ⚫ We need to demystify the black-box models, and develop more transparent and interpretable models to make them more trustworthy and robust ➢ DNNs can be easily duped by adversarial examples crafted by adding small, human-imperceptible noises ➢ It may pose severe risks for numerous applications Adversarial Attack on Social Network [Dai et al, ICML2018] [Sharif Bhagavatula Bauer Reiter 2016] 6

Is ML inherently not reliable ？ ⚫ No: But we need to re-think how we do ML ⚫ Adversarial aspects = stress-testing our solutions ⚫ Towards Adversarial Robust Models “ pig ” “ pig ” (91%) “ airliner ” (99%) = + 0.005 x 7

A Limitation of the ML Framework ⚫ Measure of performance: Fraction of mistakes during testing ⚫ But: In reality, the distributions we use ML on are NOT the ones we train it on = F Training Inference Training Inference 8

Adversary-aware Machine Learning ⚫ Machine learning systems should be aware of the arms race with the adversary ➢ Know your adversary ➢ Be proactive ➢ Protect your classifier System Designer System Designer Model adversary Develop countermeasure Simulate attack Evaluate attack ’ s impact 9

Adversarial Attack Scenarios ⚫ White box attack (WBA): Access to any information about the target classifier, including the prediction, gradient information, etc. ⚫ Practical black box attack (PBA): Only the prediction of the target classifier is available. When the prediction confidence is accessible: PBA-C; if only the discrete label is allowed: PBA-D. ⚫ Restricted black box attack (RBA): Black-box queries allowed only on some samples, and the attacker must create adversarial perturbations to other samples. WBA > PBA-C > PBA-D > RBA 10

White-box attacks ² + 1, malicious • ( ) = − f ( x ) = sign g ( x ) # - 1, legitimate • % • − − x − • – x ' • in – 11

Black-box attacks (transferability) ⚫ Cross-model transferability (Liu et al., 2017) ⚫ Cross-data transferability (Moosavi-Dezfooli et al., 2017) 12

Limitations of black-box attacks ⚫ The trade-off between transferability and attack ability, makes black-box attacks less effective. 100 Inc-v3 vs. I-FGSM Inc-v4 vs. I-FGSM IncRes-v2 vs. I-FGSM Res-152 vs. I-FGSM 80 Success Rate (%) 60 • Attack Inception V3; • Evaluate the success rates of attacks on Inception V3, 40 Inception V4, Inception ResNet V2, ResNet v2-152; • ϵ = 16 ; 20 • 1000 images from ImageNet. 0 1 2 3 4 5 6 7 8 9 10 Number of Iterations 13

Momentum iterative FGSM [CVPR18] Dong et al, Boosting Adversarial Attack via Momentum, CVPR 2018 * wining solution at NIPS 2017 competition

Experimental Results ⚫ 𝜗 = 16, 𝜈 = 1.0, 10 10 iterations ➢ MI-FGSM can attack a white-box model with near 100% success rates ➢ It fools an black-box model with much higher success rates 15

Query-based Black-box Attacks ⚫ Transfer-based ❖ Generate adversarial examples against white-box models, and leverage transferability for attacks; ❖ Require no knowledge of the target model, no queries; ❖ Need white-box models (datasets); ⚫ Score-based ❖ The target model provides output probability distribution; ❖ Black-box optimization by gradient estimation methods; ❖ Impractical in some real-world applications; ⚫ Decision-based ❖ The target model only provides hard-label predictions; ❖ Practical in real-world applications; ❖ Need a large number of queries

Score-based Attacks ⚫ Query loss function 𝑔(𝑦) given 𝑦 ⚫ Goal: Maximize 𝑔(𝑦) until attack succeeds ⚫ Estimate the gradient of 𝛼𝑔(𝑦) by queries, and apply the first-order optimization methods. 1 𝑔 𝑦+𝜏𝑣 𝑗 ,𝑧 −𝑔 𝑦,𝑧 𝑟 𝑟 σ 𝑗=1 𝑕 = ො 𝑕 𝑗 , ො where ො 𝑕 𝑗 = ⋅ 𝑣 𝑗 𝜏 ➢ In ordinary RGF method, 𝑣 𝑗 is sampled uniformly from the 𝐸 -dimensional Euclidean hypersphere. 17

Gradient estimation framework ⚫ Our loss function: 2 𝑀 ො 𝑕 = min 𝑐≥0 E 𝛼𝑔 𝑦 − 𝑐 ො 𝑕 2 ⚫ Minimized mean square error w.r.t. the scale coefficient 𝑐 ➢ Usually the normalized gradient is used, hence the norm does not matter 18

Prior-guided RGF (P-RGF) method ⚫ Use the normalized ( 𝑤 2 = 1 ) transfer gradient of a surrogated model 2 𝛼𝑔 𝑦 ⊤ 𝐃𝛼𝑔 𝑦 2 − ⊤ ] . 𝐃 = E[𝑣 𝑗 𝑣 𝑗 𝜏→0 𝑀 ො lim 𝑕 = 𝛼𝑔 𝑦 𝑟 𝛼𝑔 𝑦 ⊤ 𝐃𝛼𝑔 𝑦 , 2 1− 1 𝑟 𝛼𝑔 𝑦 ⊤ 𝐃 2 𝛼𝑔 𝑦 + 1 ⚫ The gradient estimator can be implemented as 1 − 𝜇 ⋅ 𝐉 − 𝑤𝑤 ⊤ 𝜊 𝑗 𝑣 𝑗 = 𝜇 ⋅ 𝑤 + ⚫ Incorporate the data prior to accelerate the gradient estimation 19

Performance of gradient estimation ⚫ Cosine similarity (averaged over all images) between the gradient estimate and the true gradient w.r.t. attack iterations: ⚫ The transfer gradient is more useful at the beginning and less useful later ➢ Showing the advantage of using adaptive 𝜇 ∗ 20

Results on defensive models ⚫ ASR: Attack Success Rate (#queries is under 10,000); AVG. Q: Average #queries over successful attacks. ⚫ Methods with the subscript “D” refers to the data-dependent version of the P-RGF method. 21

Query-based Black-box Attacks ⚫ Transfer-based ❖ Generate adversarial examples against white-box models, and leverage transferability for attacks; ❖ Require no knowledge of the target model, no queries; ❖ Need white-box models (datasets); ⚫ Score-based ❖ The target model provides output probability distribution; ❖ Black-box optimization by gradient estimation methods; ❖ Impractical in some real-world applications; ⚫ Decision-based ❖ The target model only provides hard-label predictions; ❖ Practical in real-world applications; ❖ Need a large number of queries

Query-based Adversarial Attack ⚫ We search for an adversarial example by modeling the local geometry of the search directions and reduce the dimension of the search space. Original False Images 1,000 True Queries 10,000 True Queries 100,000 True Queries A “ Blac k-box” Model 23

Objective Function ⚫ Constrained optimization problem 𝐸 𝑦 ∗ , 𝑦 , 𝑡. 𝑢. 𝐷 𝑔 𝑦 ∗ argmin = 1 , 𝑦 ∗ ❖ 𝐸(⋅,⋅) is a distance metric; 𝐷(⋅) is an adversarial criterion ( 𝐷 𝑔 𝑦 = 0 ). ⚫ A reformulation 𝑀 𝑦 ∗ = 𝐸 𝑦 ∗ , 𝑦 + 𝜀 𝐷 𝑔 𝑦 ∗ argmin = 1 𝑦 ∗ Non-adversarial Region Implement a black- box gradient estimation using a local-search based on query

Evolutionary Attack ⚫ (1+1) covariance matrix adaptation evolution strategy 𝑦 ∗ ∈ 𝑆 𝑜 (already adversarial) ❖ Initialize ෤ ❖ For t = 1, 2, … , T do ➢ Sample 𝑨~N 0, σ 2 C 𝑦 ∗ + 𝑨 < 𝑀(෤ 𝑦 ∗ ) : ➢ If 𝑀 ෤ 𝑦 ∗ = ෤ 𝑦 ∗ + 𝑨 ➢ ෤ ➢ Update( σ , C) ❖ Return ෤ 𝑦 ∗ ⚫ Model the local geometry of the search directions ⚫ Reduce the dimension of the search space

Covariance Matrix Adaptation ⚫ The storage and computation complexity of full covariance matrix is at least 𝑃 𝑜 2 ; ⚫ We use a diagonal covariance matrix; ⚫ Update rule: 𝑑 𝑑 (2 − 𝑑 𝑑 ) 𝑨 𝑞 𝑑 = 1 − 𝑑 𝑑 𝑞 𝑑 + 𝜏 2 𝑑 𝑗𝑗 = 1 − 𝑑 𝑑𝑝𝑤 𝑑 𝑗𝑗 + 𝑑 𝑑𝑝𝑤 𝑞 𝑑 𝑗

Adversarial Attacks and Defenses in Deep Learning Hang Su - PowerPoint PPT Presentation

Adversarial Attacks and Defenses in Deep Learning Hang Su suhangss@tsinghua.edu.cn Institute for Artificial Intelligence Dept. of Computer Science & Technology Tsinghua University 1 Background Artificial intelligence (AI) is a

Internet Outbreaks: Internet Outbreaks: Epidemiology and Defenses Epidemiology and Defenses

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

On Adaptive Attacks to Adversarial Example Defenses Florian Tramr USENIX ScAINet August 10 th

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples Nicholas

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Attacks and Defenses Dr. Falko Strenzke fstrenzke@cryptosource.de cryptosource Cryptography.

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

15-780 Graduate Artificial Intelligence: Adversarial attacks and provable defenses J. Zico

Advanced Man-at-the-end Attacks and Defenses Bjorn De Sutter ISSISP 2018 Canberra 1

Friendly Adversarial Training: Attacks Which Do Not Kill Training Make Adversarial Learning

The case for dynamic defenses against adversarial examples Ian Goodfellow SafeML ICLR Workshop

Adversarial Domain Adaptation and Adversarial Robustness Judy Hoffman + = Big Deep success

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Formal Security Analysis of Smart Embedded Systems Farid Farid Molazem

OpenFlow DDoS Mitigation C. Dillon, M. Berkelaar February 9, 2014 University of Amsterdam

Techniques for detecting compromised IoT devices Ivo van der Elzen, Jeroen van Heugten RP1

Exploiting Live Virtual Machine Migration Jon Oberheide University of Michigan February 21,

A Classification of SQL Injection Attack Techniques and Countermeasures William G.J. Halfond,

September 2014 Investor Presentation Cautionary Statements And Risk Factors That May Affect

.tr DDoS Attack December 2015 Attila zgit .tr ccTLD Manager Dec, 2015 .tr DDoS Attack A

with Wanguard and more PLONG 21, Krakw 1 -2.10.2018 r. www.itoro.com.pl About ITORO Tuning