ai and security lessons challenges future directions
play

AI and Security: Lessons, Challenges & Future Directions Dawn - PowerPoint PPT Presentation

AI and Security: Lessons, Challenges & Future Directions Dawn Song UC Berkeley AI and Security Enabler AI Security Enabler AI enables security applications Security enables better AI Integrity: produces intended/correct


  1. AI and Security: Lessons, Challenges & Future Directions Dawn Song UC Berkeley

  2. AI and Security Enabler AI Security Enabler • AI enables security applications • Security enables better AI • Integrity: produces intended/correct results (adversarial machine learning) • Confidentiality/Privacy: does not leak users’ sensitive data (secure, privacy- preserving machine learning) • Preventing misuse of AI

  3. AI and Security: AI in the presence of attacker

  4. AI and Security: AI in the presence of attacker • Important to consider the presence of attacker • History has shown attacker always follows footsteps of new technology development (or sometimes even leads it) • The stake is even higher with AI • As AI controls more and more systems, attacker will have higher & higher incentives • As AI becomes more and more capable, the consequence of misuse by attacker will become more and more severe

  5. AI and Security: AI in the presence of attacker • Attack AI • Cause learning system to not produce intended/correct results • Cause learning system to produce targeted outcome designed by attacker • Learn sensitive information about individuals • Need security in learning systems • Misuse AI • Misuse AI to attack other systems • Find vulnerabilities in other systems • Target attacks • Devise attacks • Need security in other systems

  6. AI and Security: AI in the presence of attacker • Attack AI: • Cause learning system to not produce intended/correct results • Cause learning system to produce targeted outcome designed by attacker • Learn sensitive information about individuals • Need security in learning systems • Misuse AI • Misuse AI to attack other systems • Find vulnerabilities in other systems • Target attacks • Devise attacks • Need security in other systems

  7. Deep Learning Systems Are Easily Fooled ostrich Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. Intriguing properties of neural networks. ICLR 2014.

  8. STOP Signs in Berkeley

  9. Adversarial Examples in Physical World Can we generate adversarial examples in the physical world that remain effective under different viewing conditions and viewpoints, including viewing distances and angles? 10

  10. Adversarial Examples in Physical World Subtle Perturbations Evtimov, Ivan, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, and Dawn Song. “Robust Physical-World Attacks on Machine Learning Models.” arXiv preprint arXiv:1707.08945 (2017). 11

  11. Adversarial Examples in Physical World Subtle Perturbations 12

  12. Adversarial Examples in Physical World Camouflage Perturbations 13

  13. Camouflage Perturbations 14

  14. Adversarial Examples in Physical World Adversarial perturbations are possible in physical world under different viewing conditions and viewpoints, including viewing distances and angles. Deep loss function: 15

  15. Adversarial Examples Prevalent in Deep Learning Systems • Most existing work on adversarial examples: • Image classification task • Target model is known • Our investigation on adversarial examples: Deep Generative Blackbox Reinforcement Models Attacks Learning Weaker Threat Models (Target model is unknown) VisualQA/ New Attack Image-to-code Methods Other tasks and model classes Provide more diversity of attacks

  16. Generative models ● VAE-like models (VAE, VAE-GAN) use an intermediate latent representation ● An encoder : maps a high-dimensional input into lower- dimensional latent representation z . ● A decoder: maps the latent representation back to a high- dimensional reconstruction.

  17. Adversarial Examples in Generative Models ● An example attack scenario: ● Generative model used as a compression scheme ● Attacker’s goal: for the decompressor to reconstruct a different image from the one that the compressor sees.

  18. Adversarial Examples for VAE-GAN in MNIST Target Image Original images Reconstruction of original images Adversarial examples Reconstruction of adversarial examples Jernej Kos, Ian Fischer, Dawn Song: Adversarial Examples for Generative Models

  19. Adversarial Examples for VAE-GAN in SVHN Target Image Original images Reconstruction of original images Adversarial examples Reconstruction of adversarial examples Jernej Kos, Ian Fischer, Dawn Song: Adversarial Examples for Generative Models

  20. Adversarial Examples for VAE-GAN in SVHN Target Image Original images Reconstruction of original images Adversarial examples Reconstruction of adversarial examples Jernej Kos, Ian Fischer, Dawn Song: Adversarial Examples for Generative Models

  21. Deep Reinforcement Learning Agent (A3C) Playing Pong Original Frames Jernej Kos and Dawn Song: Delving into adversarial attacks on deep policies [ICLR Workshop 2017].

  22. Adversarial Examples on A3C Agent on Pong Score No. of steps Jernej Kos and Dawn Song: Delving into adversarial attacks on deep policies [ICLR Workshop, 2017]

  23. Attacks Guided by Value Function Score Score No. of steps No. of steps Injecting adversarial perturbations Blindly injecting adversarial perturbations every 10 frames. guided by the value function.

  24. Agent in Action With FGSM perturbations With FGSM perturbations Original Frames ( 𝜗 = 0.005) inject in ( 𝜗 = 0.005) inject based every frame on value function Jernej Kos and Dawn Song: Delving into adversarial attacks on deep policies [ICLR Workshop 2017].

  25. Visual Q&A Given a question and an image, predict the answer.

  26. Studied VQA Models Model 1: MCB ( https://arxiv.org/abs/1606.01847 ) • Uses Multimodal Compact Bilinear pooling to combine the image feature and question embedding.

  27. Studied VQA Models Model 2: NMN ( https://arxiv.org/abs/1704.05526 ) • A representative of neural module networks • First predicts a network layout according to the question, then predicts the answer using the obtained network.

  28. Question: What color is the sky? Original answer: MCB - blue, NMN - blue. Target: gray. Answer after attack: MCB - gray, NMN - gray. benign image adversarial image for MCB adversarial image for NMN Xiaojun Xu, Xinyun Chen, Chang Liu, Anna Rohrbach, Trevor Darell, Dawn Song: Can you fool AI with adversarial examples on a visual Turing test?

  29. Question: Is it raining? Original answer: MCB - no, NMN - no. Target: yes. Answer after attack: MCB - yes, NMN - yes. benign image adv image for MCB adv image for NMN

  30. Question: What is on the ground? Original answer: MCB - sand, NMN - sand. Target: snow. Answer after attack: MCB - snow, NMN - snow. benign image adv image for MCB adv image for NMN

  31. Question: Where is the plane? Original answer: MCB - runway, NMN - runway. Target: sky. Answer after attack: MCB - sky, NMN - sky. benign image adv image for MCB adv image for NMN

  32. Question: What color is the traffic light? Original answer: MCB - green, NMN - green. Target: red. Answer after attack: MCB - red, NMN - red. benign image adv image for MCB adv image for NMN

  33. Question: What does the sign say? Original answer: MCB - stop, NMN - stop. Target: one way. Answer after attack: MCB - one way, NMN - one way. benign image adv image for MCB adv image for NMN

  34. Question: How many cats are there? Original answer: MCB - 1, NMN - 1. Target: 2. Answer after attack: MCB - 2, NMN - 2. benign image adv image for MCB adv image for NMN

  35. Adversarial Examples Prevalent in Deep Learning Systems • Most existing work on adversarial examples: • Image classification task • Target model is known • Our investigation on adversarial examples: Deep Generative Blackbox Reinforcement Models Attacks Learning Weaker Threat Models (Target model is unknown) VisualQA/ New Attack Image-to-code Methods Other tasks and model classes Provide more diversity of attacks

  36. A General Framework for Black-box attacks • Zero-Query Attack (Previous methods) • Random perturbation • Difference of means • Transferability-based attack • Practical Black-Box Attacks against Machine Learning [Papernot et al. 2016] • Ensemble transferability-based attack [ Yanpei Liu, Xinyun Chen, Chang Liu, Dawn Song: Delving into Transferable Adversarial Examples and Black-box Attacks, ICLR 2017] • Query Based Attack (new method) • Finite difference gradient estimation • Query reduced gradient estimation • A general active query game model The zero-query attack can be viewed as a special case for the query based attack, where the number of queries made is zero

  37. Query Based attacks • Finite difference gradient estimation • Given d -dimensional vector x , we can make 2 d queries to estimate the gradient as below • An example of approximate FGS with finite difference Similarly, we can also approximate for logit-based loss by making 2d queries x adv = x + ✏ · sign (FD x ( ` f ( x , y ) , � )) • Query reduced gradient estimation • Random grouping • PCA [Bhagoji, Li, He, Song, 2017]

  38. Query Based Attacks Finite Differences method outperforms other black-box attacks and achieves similar attack success rate with the white-box attack Gradient estimation method with query reduction performs approximately similar as without query reduction

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend