attacks meet interpretability attribute steered detection
play

Attacks Meet Interpretability: Attribute-steered Detection of - PowerPoint PPT Presentation

Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples Guanhong Tao , Shiqing Ma, Yingqi Liu, Xiangyu Zhang Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( 50 times)


  1. Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples Guanhong Tao , Shiqing Ma, Yingqi Liu, Xiangyu Zhang

  2. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack A.J. Buckley � 2

  3. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2

  4. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2

  5. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2

  6. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2

  7. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2

  8. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley • Idea: is the classification result of a model mainly based on human perceptible attributes? � 2

  9. Architecture of AmI � 3

  10. Architecture of AmI Input � 3

  11. Architecture of AmI 1 Landmark Input generation � 3

  12. Architecture of AmI ✓ Left eye ✓ Right eye 1 2 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Input generation annotation � 3

  13. Architecture of AmI ✓ Left eye ✓ Right eye 1 2 3 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction � 3

  14. Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction � 3

  15. Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction Original model � 3

  16. Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 5 Consistency ✓ Nose ⊖ observer ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction Original model � 3

  17. Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 5 Consistency ✓ Nose ⊖ observer ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction Original model � 3

  18. Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? � 4

  19. Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning � 4

  20. Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning Forward: attribute changes —> neuron activation changes ‣ � 4

  21. Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning Forward: attribute changes —> neuron activation changes ‣ Backward: neuron activation changes —> attribute changes ‣ � 4

  22. Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning Forward: attribute changes —> neuron activation changes ‣ Backward: neuron activation changes —> attribute changes ‣ Backward: no attribute changes —> no neuron activation changes ‣ � 4

  23. Attribute Witness Extraction � 5

  24. Attribute Witness Extraction Input � 5

  25. Attribute Witness Extraction ⊖ Model Feature variants Attribute substitution A Model C Input � 5

  26. Attribute Witness Extraction ⊖ Model Feature variants Attribute substitution A Model C D B Model Input Attribute preservation Feature invariants ⊖ Model � 5

  27. Attribute Witness Extraction ⊖ Model Feature variants Attribute substitution A Model C E D Attribute witnesses B Model Input Attribute preservation Feature invariants ⊖ Model � 5

  28. Experimental Results � 6

  29. Experimental Results • Attribute witnesses � 6

  30. Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer � 6

  31. Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer • Adversary detection � 6

  32. Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer • Adversary detection Achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false ‣ positives on benign inputs � 6

  33. Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer • Adversary detection Achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false ‣ positives on benign inputs A state-of-the-art technique Feature Squeezing (NDSS '18) can only achieve 55% ‣ accuracy with 23.3% false positives for face recognition systems � 6

  34. Thank you! Please visit our poster #99 05:00-07:00 PM @ Room 210 & 230 AB

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend