Attacks Meet Interpretability: Attribute-steered Detection of - PowerPoint PPT Presentation

Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples Guanhong Tao , Shiqing Ma, Yingqi Liu, Xiangyu Zhang

Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack A.J. Buckley � 2

Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2

Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley • Idea: is the classification result of a model mainly based on human perceptible attributes? � 2

Architecture of AmI � 3

Architecture of AmI Input � 3

Architecture of AmI 1 Landmark Input generation � 3

Architecture of AmI ✓ Left eye ✓ Right eye 1 2 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Input generation annotation � 3

Architecture of AmI ✓ Left eye ✓ Right eye 1 2 3 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction � 3

Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction � 3

Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction Original model � 3

Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 5 Consistency ✓ Nose ⊖ observer ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction Original model � 3

Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? � 4

Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning � 4

Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning Forward: attribute changes —> neuron activation changes ‣ � 4

Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning Forward: attribute changes —> neuron activation changes ‣ Backward: neuron activation changes —> attribute changes ‣ � 4

Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning Forward: attribute changes —> neuron activation changes ‣ Backward: neuron activation changes —> attribute changes ‣ Backward: no attribute changes —> no neuron activation changes ‣ � 4

Attribute Witness Extraction � 5

Attribute Witness Extraction Input � 5

Attribute Witness Extraction ⊖ Model Feature variants Attribute substitution A Model C Input � 5

Attribute Witness Extraction ⊖ Model Feature variants Attribute substitution A Model C D B Model Input Attribute preservation Feature invariants ⊖ Model � 5

Attribute Witness Extraction ⊖ Model Feature variants Attribute substitution A Model C E D Attribute witnesses B Model Input Attribute preservation Feature invariants ⊖ Model � 5

Experimental Results � 6

Experimental Results • Attribute witnesses � 6

Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer � 6

Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer • Adversary detection � 6

Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer • Adversary detection Achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false ‣ positives on benign inputs � 6

Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer • Adversary detection Achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false ‣ positives on benign inputs A state-of-the-art technique Feature Squeezing (NDSS '18) can only achieve 55% ‣ accuracy with 23.3% false positives for face recognition systems � 6

Thank you! Please visit our poster #99 05:00-07:00 PM @ Room 210 & 230 AB

Attacks Meet Interpretability: Attribute-steered Detection of - PowerPoint PPT Presentation

Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples Guanhong Tao , Shiqing Ma, Yingqi Liu, Xiangyu Zhang Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( 50 times)

Interpretability of Machine Learning for Computer Vision Xinshuo Weng* *Most slides borrowed

The Mythos of Model Interpretability Zachary C. Lipton https://arxiv.org/abs/1606.03490 Outline

The Mythos of Model Interpretability Zachary C. Lipton https://arxiv.org/abs/1606.03490 Outline

INTERPRETABILITY AND INTERPRETABILITY AND EXPLAINABILITY EXPLAINABILITY Christian Kaestner

Interpretability and functional transparency Tommi Jaakkola in collaboration with David Alvarez

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Attribute Grammars Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

Explaining Machine Learning Models Armen Donigian Director of Data Science Engineering Roadmap

Interpretability in NLP: Moving Beyond Vision Shuoyang Ding Microsoft Translator Talk Series

Interpretability in PRA Marta Bilkova , Dick de Jongh , and Joost J. Joosten ,

Interpretability and the arithmetized completeness theorem (Taishi Kurahashi)

Interpretability and Robustness for Multi-Hop QA Mohit Bansal (MRQA-EMNLP 2019 Workshop) 1

Proposal to add masking function to GFM Proposal part 1 Adding a masking reference attribute on

Decorated Attribute Grammars Attribute Evaluation Meets Strategic Programming CC 2009, York, UK

Attribute Dependencies Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

The Syntax of Classes and Objects in Python Defining a Class - "Inventing a Composite Data

MutaPon Analysis in Frozen and FFPE Tumor Samples Gad Getz, PhD KrisPn Ardlie, PhD Broad

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu

Framework to extract Coq terms to -terms Semi-automatic verification (only briefly

When We First Met: Visual-Inertial Person Localization for Co-Robot Rendezvous Xi Sun, Xinshuo

Burrows-Wheeler Transform and FM Index Ben Langmead You are free to use these slides. If you do,

Jigsaw: Indoor Floor Plan Reconstruction via Mobile Crowdsensing Ruipeng Gao 1 , Mingmin Zhao 1 ,

GeniusRoute: A New Analog Routing Paradigm Using Generative Neural Network Guidance Keren Zhu ,

Learning From/For Knowledge Bases Graham Neubig Site https://phontron.com/class/nn4nlp2019/

Sambuz

Useful Links

Newsletter

Mail Us

Attacks Meet Interpretability: Attribute-steered Detection of - PowerPoint PPT Presentation

Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples Guanhong Tao , Shiqing Ma, Yingqi Liu, Xiangyu Zhang Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( 50 times)

Interpretability of Machine Learning for Computer Vision Xinshuo Weng* *Most slides borrowed

The Mythos of Model Interpretability Zachary C. Lipton https://arxiv.org/abs/1606.03490 Outline

The Mythos of Model Interpretability Zachary C. Lipton https://arxiv.org/abs/1606.03490 Outline

INTERPRETABILITY AND INTERPRETABILITY AND EXPLAINABILITY EXPLAINABILITY Christian Kaestner

Interpretability and functional transparency Tommi Jaakkola in collaboration with David Alvarez

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Attribute Grammars Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

Explaining Machine Learning Models Armen Donigian Director of Data Science Engineering Roadmap

Interpretability in NLP: Moving Beyond Vision Shuoyang Ding Microsoft Translator Talk Series

Interpretability in PRA Marta Bilkova , Dick de Jongh , and Joost J. Joosten ,

Interpretability and the arithmetized completeness theorem (Taishi Kurahashi)

Interpretability and Robustness for Multi-Hop QA Mohit Bansal (MRQA-EMNLP 2019 Workshop) 1

Proposal to add masking function to GFM Proposal part 1 Adding a masking reference attribute on

Decorated Attribute Grammars Attribute Evaluation Meets Strategic Programming CC 2009, York, UK

Attribute Dependencies Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

The Syntax of Classes and Objects in Python Defining a Class - &quot;Inventing a Composite Data

MutaPon Analysis in Frozen and FFPE Tumor Samples Gad Getz, PhD KrisPn Ardlie, PhD Broad

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu

Framework to extract Coq terms to -terms Semi-automatic verification (only briefly

When We First Met: Visual-Inertial Person Localization for Co-Robot Rendezvous Xi Sun, Xinshuo

Burrows-Wheeler Transform and FM Index Ben Langmead You are free to use these slides. If you do,

Jigsaw: Indoor Floor Plan Reconstruction via Mobile Crowdsensing Ruipeng Gao 1 , Mingmin Zhao 1 ,

GeniusRoute: A New Analog Routing Paradigm Using Generative Neural Network Guidance Keren Zhu ,

Learning From/For Knowledge Bases Graham Neubig Site https://phontron.com/class/nn4nlp2019/

Sambuz

Useful Links

Newsletter

Mail Us

The Syntax of Classes and Objects in Python Defining a Class - "Inventing a Composite Data