security and privacy in machine learning
play

Security and Privacy in Machine Learning Nicolas Papernot - PowerPoint PPT Presentation

Security and Privacy in Machine Learning Nicolas Papernot Pennsylvania State University & Google Brain Lecture for Prof. Trent Jaegers CSE 543 Computer Security Class November 2017 - Penn State Thank you to my collaborators Patrick


  1. Security and Privacy in Machine Learning Nicolas Papernot Pennsylvania State University & Google Brain Lecture for Prof. Trent Jaeger’s CSE 543 Computer Security Class November 2017 - Penn State

  2. Thank you to my collaborators Patrick McDaniel (Penn State) Alexey Kurakin (Google Brain) Martín Abadi (Google Brain) Praveen Manoharan (CISPA) Pieter Abbeel (Berkeley) Ilya Mironov (Google Brain) Michael Backes (CISPA) Ananth Raghunathan (Google Brain) Dan Boneh (Stanford) Arunesh Sinha (U of Michigan) Z. Berkay Celik (Penn State) Shuang Song (UCSD) Yan Duan (OpenAI) Ananthram Swami (US ARL) Úlfar Erlingsson (Google Brain) Kunal Talwar (Google Brain) Matt Fredrikson (CMU) Ian Goodfellow Florian Tramèr (Stanford) Kathrin Grosse (CISPA) (Google Brain) Michael Wellman (U of Michigan) Sandy Huang (Berkeley) Xi Wu (Google) Somesh Jha (U of Wisconsin) 2

  3. Machine Learning [0.01, 0.84 , 0.02, 0.01, 0.01, 0.01, 0.05, 0.01, 0.03, 0.01] Classifier [p(0|x,θ), p(1|x,θ), p(2|x,θ), …, p(7|x,θ), p(8|x,θ), p(9|x,θ)] x f(x,θ) Classifier : map inputs to one class among a predefined set 3

  4. [0 1 0 0 0 0 0 0 0 0] [0 1 0 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 1 0 0] Machine Learning [0 0 0 0 0 0 0 0 0 1] [0 0 0 1 0 0 0 0 0 0] Classifier [0 0 0 0 0 0 0 0 1 0] [0 0 0 0 0 0 1 0 0 0] [0 1 0 0 0 0 0 0 0 0] [0 0 0 0 1 0 0 0 0 0] Learning : find internal classifier parameters θ that minimize a cost/loss function (~model error) 4

  5. Outline of this lecture 1 Security in ML 2 Privacy in ML 5

  6. Part I Security in machine learning 6

  7. Attack Models Attacker may see the model: bad even if an attacker needs to know details of the machine learning model to do an attack --- aka a white-box attacker ML Attacker may not need the model: worse if attacker who knows very little (e.g. only gets to ask a few questions) can do an attack --- aka a black-box attacker ML 7 Papernot et al. Towards the Science of Security and Privacy in Machine Learning

  8. Attack Models Attacker may see the model: bad even if an attacker needs to know details of the machine learning model to do an attack --- aka a white-box attacker ML Attacker may not need the model: worse if attacker who knows very little (e.g. only gets to ask a few questions) can do an attack --- aka a black-box attacker ML 8 Papernot et al. Towards the Science of Security and Privacy in Machine Learning

  9. Adversarial examples (white-box attacks) 9

  10. J acobian-based S aliency M ap A pproach (JSMA) 10 Papernot et al. The Limitations of Deep Learning in Adversarial Settings

  11. Jacobian-Based Iterative Approach: source-target misclassification 11 Papernot et al. The Limitations of Deep Learning in Adversarial Settings

  12. Evading a Neural Network Malware Classifier DREBIN dataset of Android applications P[X= Malware ] = 0.90 Add constraints to JSMA approach: P[X=Benign] = 0.10 - only add features: keep malware behavior - only features from manifest : easy to modify P[X*=Malware] = 0.10 P[X*= Benign ] = 0.90 “Most accurate” neural network - 98% accuracy, with 9.7% FP and 1.3% FN - Evaded with a 63.08% success rate 12 Grosse et al. Adversarial Perturbations Against Deep Neural Networks for Malware Classification

  13. Supervised vs. reinforcement learning Supervised learning Reinforcement learning Observation Model inputs Environment & Reward function (e.g., traffic sign, music, email) Class Model outputs (e.g., stop/yield, jazz/classical, Action spam/legitimate) Maximize reward Training “goal” Minimize class prediction error by exploring the environment and (i.e., cost/loss) over pairs of (inputs, outputs) taking actions Example 13

  14. Adversarial attacks on neural network policies 14 Huang et al. Adversarial Attacks on Neural Network Policies

  15. Adversarial examples (black-box attacks) 15

  16. Threat model of a black-box attack Training data Model architecture Adversarial capabilities Model parameters Model scores (limited) oracle access: labels Adversarial goal Force a ML model remotely accessible through an API to misclassify Example 16

  17. Our approach to black-box attacks Alleviate lack of knowledge Alleviate lack of about model training data 17

  18. Adversarial example transferability Adversarial examples have a transferability property: samples crafted to mislead a model A are likely to mislead a model B These property comes in several variants: ML A ● Intra-technique transferability: ○ Cross model transferability ○ Cross training set transferability ● Cross-technique transferability 18 Szegedy et al. Intriguing properties of neural networks

  19. Adversarial example transferability Adversarial examples have a transferability property: samples crafted to mislead a model A are likely to mislead a model B These property comes in several variants: ML A ● Intra-technique transferability: ○ Cross model transferability ○ Cross training set transferability ● Cross-technique transferability ML B Victim 19 Szegedy et al. Intriguing properties of neural networks

  20. Adversarial example transferability Adversarial examples have a transferability property: samples crafted to mislead a model A are likely to mislead a model B 20

  21. Cross-technique transferability 21 Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

  22. Cross-technique transferability 22 Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

  23. Our approach to black-box attacks Alleviate lack of knowledge Alleviate lack of about model training data Adversarial example transferability from a substitute model to target model 23

  24. Attacking remotely hosted black-box models Remote ML sys “no truck sign” “STOP sign” “STOP sign” (1) The adversary queries remote ML system for labels on inputs of its choice. 24

  25. Attacking remotely hosted black-box models Local Remote substitute ML sys “no truck sign” “STOP sign” “STOP sign” (2) The adversary uses this labeled data to train a local substitute for the remote system. 25

  26. Attacking remotely hosted black-box models Local Remote substitute ML sys “no truck sign” “STOP sign” (3) The adversary selects new synthetic inputs for queries to the remote ML system based on the local substitute’s output surface sensitivity to input variations. 26

  27. Attacking remotely hosted black-box models “yield sign” Local Remote substitute ML sys (4) The adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system because of transferability. 27

  28. Our approach to black-box attacks Alleviate lack of knowledge Alleviate lack of about model training data + Adversarial example transferability from a Synthetic data substitute model to generation target model 28

  29. Results on real-world remote systems Adversarial examples Remote Platform ML technique Number of queries misclassified (after querying) Deep Learning 6,400 84.24% Logistic Regression 800 96.19% Unknown 2,000 97.72% All remote classifiers are trained on the MNIST dataset (10 classes, 60,000 training samples) 29 [PMG16a] Papernot et al. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples

  30. Benchmarking progress in the adversarial ML community 30

  31. 31

  32. Growing community 1.3K+ stars 340+ forks 40+ contributors 32

  33. Adversarial examples represent worst-case distribution drifts 33 [DDS04] Dalvi et al. Adversarial Classification (KDD)

  34. Adversarial examples are a tangible instance of hypothetical AI safety problems 34 Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg

  35. Part II Privacy in machine learning 35

  36. Types of adversaries and our threat model Model querying ( black-box adversary ) Black-box ? ML Shokri et al. (2016) Membership Inference Attacks against ML Models Fredrikson et al. (2015) Model Inversion Attacks Model inspection ( white-box adversary ) Zhang et al. (2017) Understanding DL requires rethinking generalization In our work, the threat model assumes: - Adversary can make a potentially unbounded number of queries - Adversary has access to model internals 36

  37. A definition of privacy } Answer 1 Answer 2 Randomized ... Algorithm ? ? ? Answer n ? Answer 1 Randomized Answer 2 Algorithm ... Answer n 37

  38. Our design goals Problem Preserve privacy of training data when learning classifiers Differential privacy protection guarantees Intuitive privacy protection guarantees Goals Generic * (independent of learning algorithm) *This is a key distinction from previous work, such as Pathak et al. (2011) Privacy preserving probabilistic inference with hidden markov models Jagannathan et al. (2013) A semi-supervised learning approach to differential privacy Shokri et al. (2015) Privacy-preserving Deep Learning Abadi et al. (2016) Deep Learning with Differential Privacy Hamm et al. (2016) Learning privately from multiparty data 38

  39. The PATE approach 39

  40. Teacher ensemble Partition 1 Teacher 1 Partition 2 Teacher 2 Sensitive Partition 3 Teacher 3 Data ... ... Partition n Teacher n Training Data flow 40

  41. Aggregation Count votes Take maximum 41

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend