Dawn Song
UC Berkeley
AI and Security: Lessons, Challenges & Future Directions Dawn - - PowerPoint PPT Presentation
AI and Security: Lessons, Challenges & Future Directions Dawn Song UC Berkeley AI and Security Enabler AI Security Enabler AI enables security applications Security enables better AI Integrity: produces intended/correct
Dawn Song
UC Berkeley
Security AI
Enabler Enabler
preserving machine learning)
sometimes even leads it)
severe
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. Intriguing properties of neural networks. ICLR 2014.
10
Can we generate adversarial examples in the physical world that remain effective under different viewing conditions and viewpoints, including viewing distances and angles?
11
Subtle Perturbations
Evtimov, Ivan, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, and Dawn Song. “Robust Physical-World Attacks on Machine Learning Models.” arXiv preprint arXiv:1707.08945 (2017).
12
Subtle Perturbations
13
Camouflage Perturbations
14
Camouflage Perturbations
15
Adversarial perturbations are possible in physical world under different viewing conditions and viewpoints, including viewing distances and angles.
Deep loss function:
Adversarial Examples Prevalent in Deep Learning Systems
Blackbox Attacks
Weaker Threat Models (Target model is unknown)
Generative Models Deep Reinforcement Learning VisualQA/ Image-to-code
Other tasks and model classes
New Attack Methods
Provide more diversity of attacks
Generative models
representation
dimensional latent representation z.
dimensional reconstruction.
Adversarial Examples in Generative Models
different image from the one that the compressor sees.
Adversarial Examples for VAE-GAN in MNIST
Target Image
Jernej Kos, Ian Fischer, Dawn Song: Adversarial Examples for Generative Models Original images Reconstruction of original images Adversarial examples Reconstruction of adversarial examples
Adversarial Examples for VAE-GAN in SVHN
Target Image
Jernej Kos, Ian Fischer, Dawn Song: Adversarial Examples for Generative Models
Original images Reconstruction of original images Adversarial examples Reconstruction of adversarial examples
Target Image
Jernej Kos, Ian Fischer, Dawn Song: Adversarial Examples for Generative Models
Original images Reconstruction of original images Adversarial examples Reconstruction of adversarial examples
Adversarial Examples for VAE-GAN in SVHN
Deep Reinforcement Learning Agent (A3C) Playing Pong
Original Frames
Jernej Kos and Dawn Song: Delving into adversarial attacks on deep policies [ICLR Workshop 2017].
Jernej Kos and Dawn Song: Delving into adversarial attacks on deep policies [ICLR Workshop, 2017]
Score
Blindly injecting adversarial perturbations every 10 frames.
Score Score
Injecting adversarial perturbations guided by the value function.
Original Frames With FGSM perturbations (𝜗 = 0.005) inject in every frame With FGSM perturbations (𝜗 = 0.005) inject based
Jernej Kos and Dawn Song: Delving into adversarial attacks on deep policies [ICLR Workshop 2017].
Given a question and an image, predict the answer.
Model 1: MCB ( https://arxiv.org/abs/1606.01847 )
question embedding.
Model 2: NMN ( https://arxiv.org/abs/1704.05526 )
answer using the obtained network.
Question: What color is the sky? Original answer: MCB - blue, NMN - blue. Target: gray. Answer after attack: MCB - gray, NMN - gray.
benign image adversarial image for MCB adversarial image for NMN
Xiaojun Xu, Xinyun Chen, Chang Liu, Anna Rohrbach, Trevor Darell, Dawn Song: Can you fool AI with adversarial examples on a visual Turing test?
Question: Is it raining? Original answer: MCB - no, NMN - no. Target: yes. Answer after attack: MCB - yes, NMN - yes.
benign image adv image for MCB adv image for NMN
Question: What is on the ground? Original answer: MCB - sand, NMN - sand. Target: snow. Answer after attack: MCB - snow, NMN - snow.
benign image adv image for MCB adv image for NMN
Question: Where is the plane? Original answer: MCB - runway, NMN - runway. Target: sky. Answer after attack: MCB - sky, NMN - sky.
benign image adv image for MCB adv image for NMN
Question: What color is the traffic light? Original answer: MCB - green, NMN - green. Target: red. Answer after attack: MCB - red, NMN - red. benign image adv image for MCB adv image for NMN
Question: What does the sign say? Original answer: MCB - stop, NMN - stop. Target: one way. Answer after attack: MCB - one way, NMN - one way.
benign image adv image for MCB adv image for NMN
Question: How many cats are there? Original answer: MCB - 1, NMN - 1. Target: 2. Answer after attack: MCB - 2, NMN - 2.
benign image adv image for MCB adv image for NMN
Adversarial Examples Prevalent in Deep Learning Systems
Blackbox Attacks
Weaker Threat Models (Target model is unknown)
Generative Models Deep Reinforcement Learning VisualQA/ Image-to-code
Other tasks and model classes
New Attack Methods
Provide more diversity of attacks
Delving into Transferable Adversarial Examples and Black-box Attacks, ICLR 2017]
The zero-query attack can be viewed as a special case for the query based attack, where the number of queries made is zero
gradient as below
xadv = x + ✏ · sign (FDx(`f(x, y), ))
Similarly, we can also approximate for logit-based loss by making 2d queries [Bhagoji, Li, He, Song, 2017]
Finite Differences method outperforms other black-box attacks and achieves similar attack success rate with the white-box attack Gradient estimation method with query reduction performs approximately similar as without query reduction
The Gradient-Estimation black-box attack on Clarifai’s Content Moderation Model
Original image, classified as “drug” with a confidence of 0.99 Adversarial example, classified as “safe” with a confidence of 0.96
Adversarial Examples Prevalent in Deep Learning Systems
Blackbox Attacks
Weaker Threat Models (Target model is unknown)
Generative Models Deep Reinforcement Learning VisualQA/ Image-to-code
Other tasks and model classes
New Attack Methods
Provide more diversity of attacks
L = Lf
adv + αLGAN + βLhinge
LGAN = Ex∼Pdata(x) log D(x) + Ex∼Pdata(x) log(1 − D(x + G(x)))
Black-box can be performed here via distillation
[Xiao, Li, Zhu, He, Liu, Song, 2017]
Semi-white box attack on MNIST Black-box attack on MNIST The perturbed images are very close to the original ones. The original images lie on the diagonal.
The perturbed images are very close to the original ones. The original images lie on the diagonal.
Ensemble Normalization Distributional detection PCA detection Secondary classification Stochastic Generative Training process Architecture Retrain Pre-process input
Detection Prevention
system to learn wrong model
Security will be one of the biggest challenges in Deploying AI
vulnerabilities
Proactive Defense: Bug Finding Proactive Defense: Secure by Construction Reactive Defense
Automatic worm detection & signature/patch generation Automatic malware detection & analysis Progression of different approaches to software security over last 20 years
Regression Testing Security Testing Operation Run program on normal inputs Run program on abnormal/adversarial inputs Goal Prevent normal users from encountering errors Prevent attackers from finding exploitable errors
Regression Testing Security Testing Training Train on noisy training data: Estimate resiliency against noisy training inputs Train on poisoned training data: Estimate resiliency against poisoned training inputs Testing Test on normal inputs: Estimate generalization error Test on abnormal/adversarial inputs: Estimate resiliency against adversarial inputs
Decades of Work on Reasoning about Symbolic Programs
IronClad/IronFleet FSCQ CertiKOS EasyCrypt CompCert miTLS/Everest
Verified: Micro-kernel, OS, File system, Compiler, Security protocols, Distributed systems
Coq
Why3 Z3
No Sufficient Tools to Reason about Non-Symbolic Programs
security guarantees
Example Applications:
Program Intent Program Synthesizer
Can we teach computers to write code?
“Software is eating the world” --- az16 Program synthesis can automate this & democratize idea realization
Training data 452 345 123 234 357 Input Output 797 612 367 979
Neural Program Architecture Learned neural program
Test input Test output 120 Training data 452 345 123 234 357 Input Output 797 612 367 979 50 70
Neural Turing Machine (Graves et al) Neural Programmer (Neelankatan et al) Neural Programmer-Interpreter (Reed et al) Neural GPU (Kaiser et al) Stack Recurrent Nets (Joulin et al) Learning Simple Algorithms from Examples (Zaremba et al) Differentiable Neural Computer (Graves et al)
Neural Program Synthesis Tasks: Copy, Grade-school addition, Sorting, Shortest Path
Nov 2014 May 2015 Dec 2015 May 2016 June 2016 Oct 2016 Reinforcement Learning Neural Turing Machines (Zaremba et al)
Training data 452 345 123 234 357 Input Output 797 612 367 979
length = 5 length = 3
Neural Program Architecture Learned neural program
Test input Test output 54321 34216 24320
58536
Training data 452 345 123 234 357 Input Output 797 612 367 979
length = 3 length = 5
Neural Program Architecture Learned neural program
Test input Test output 34216 24320
Jonathon Cai, Richard Shin, Dawn Song: Making Neural Programming Architectures Generalize via Recursion [ICLR 2017, Best Paper Award ]
Quicksort
Jonathon Cai, Richard Shin, Dawn Song: Making Neural Programming Architectures Generalize via Recursion [ICLR 2017, Best Paper Award ]
Our Approach: Making Neural Programming Architectures Generalize via Recursion
Accuracy on Random Inputs for Quicksort
properties for broader tasks
security guarantees
Deep Learning Empowered Bug Finding Deep Learning Empowered Phishing Attacks
Misused AI for large-scale, automated, targeted manipulation
How to better understand what security means for AI, learning systems? How to detect when a learning system has been fooled/compromised? How to build better resilient systems with stronger guarantees? How to build privacy-preserving learning systems?
Security will be one of the biggest challenges in Deploying AI