Neural Cleanse: Identifying and Mitigating Backdoor Attacks in - PowerPoint PPT Presentation

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks Bolun Wang*, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath § , Haitao Zheng, Ben Y. Zhao University of Chicago, *UC Santa Barbara, § Virginia Tech bolunwang@cs.ucsb.edu

Neural Networks: Powerful yet Mysterious MNIST (hand-written digit recognition) • • The working mechanism of Power lies in the complexity DNN is hard to understand • 3-layer DNN with 10K • DNNs work as black-boxes neurons and 25M weights Photo credit: Denis Dmitriev 2

How do we test DNNs? • We test it using test samples • Recent work try to explain DNN’s • If DNN behaves correctly on test samples, behavior on certain samples • E.g. LIME then we think the model is correct 3

What about untested samples? • Interpretability doesn’t solve all the problems • Focus on “understanding” DNN’s decision on tested samples Tested Sasmples • ≠ “predict” how DNNs would behave on untested samples Untested Sasmples • Exhaustively testing all possible samples is impossible We cannot control DNNs’ behavior on untested samples 4

Could DNNs be compromised? • Multiple examples of DNNs making disastrous mistakes • What if attacker could plant backdoors into DNNs • To trigger unexpected behavior the attacker specifies 5

Definition of Backdoor • Hidden malicious behavior trained into a DNN Attacker-specified behavior • DNN behaves normally on clean inputs on any input with trigger Adversarial Inputs Trigger “Stop” “Speed limit” “Yield” “Speed limit” Backdoored “Do not enter” “Speed limit” DNN 6

Prior Work on Injecting Backdoor • BadNets : poison the training set [1] 1) Configuration 2) Training w/ poisoned dataset “stop sign” Train Infected Modified Trigger: Model samples “do not enter” Target label: “speed limit” “speed limit” Learn patterns of both normal data and the trigger • Trojan : automatically design a trigger for more effective attack [2] • Design a trigger to maximally fire specific neurons (build a stronger connection) [1]: “Badnets: Identifying vulnerabilities in the machine learning model supply chain.” MLSec’17 (co-located w/ NIPS) [2]: “Trojaning Attack on Neural Networks.” NDSS’18 7

Defense Goals and Assumptions • Goals Detection Mitigation • Whether a DNN is infected? • Detect and reject adversarial inputs • If so, what is the target label? • Patch the DNN to remove the backdoor • What is the trigger used? • Assumptions Has access to • A set of correctly labeled samples • Computational resources Does NOT have access to • Poisoned samples used by the attacker Infected DNN User 8

Key Intuition of Detecting Backdoor • Definition of backdoor: misclassify any sample with trigger into the target label, regardless of its original label Infected model Clean model Trigger Decision Dimension Minimum ∆ needed Adversarial samples Boundary to misclassify all A A B C samples into A Normal Normal B C Dimension Dimension Minimum ∆ needed to Intuition: In an infected model, it requires much misclassify all samples into A smaller modification to cause misclassification into the target label than into other uninfected labels 9

Design Overview: Detection 𝑧 # 1. If the model is infected? 𝑧 $ (if any label has small trigger and appears as outlier?) Outlier detection 2. Which label is the target label? 𝑧 % (which label appears as outlier?) to compare trigger size 3. How the backdoor attack works? (what is the trigger for the target label?) 𝑧 & Reverse-engineered trigger: Minimum ∆ needed to misclassify all samples into 𝑧 ' 10

Experiment Setup • Train 4 BadNets models • Use 2 Trojan models shared by prior work • Clean models for each task # of # of Attack Success Classification Accuracy Model Name Input Size Labels Layers Rate (change of accuracy) MNIST 28 × 28 × 1 10 4 99.90% 98.54% ( ↓ 0.34%) GTSRB 32 × 32 × 3 43 8 97.40% 96.51% ( ↓ 0.32%) BadNets YouTube Face 55 × 47 × 3 1,283 8 97.20% 97.50% ( ↓ 0.64%) PubFig 224 × 224 × 3 65 16 95.69% 95.69% ( ↓ 2.62%) Trojan Square 224 × 224 × 3 2,622 16 99.90% 70.80% ( ↓ 6.40%) Trojan Trojan 224 × 224 × 3 2,622 16 97.60% 71.40% ( ↓ 5.80%) Watermark 11

Backdoor Detection Performance (1/3) • Q1: If a DNN is infected? Infected 6 Infected Clean Successfully detect 5 Anomaly Index all infected models 4 3 2 1 Clean 0 MNIST GTSRB YouTube PubFig Trojan Trojan Face Square Watermark 12

Backdoor Detection Performance (2/3) • Q2: Which label is the target label? Infected target label always has the smallest 𝑀 # norm 400 4000 Uninfected 350 3500 L1 Norm of Trigger Infected 300 3000 250 2500 200 2000 150 1500 100 1000 50 500 0 0 MNIST GTSRB YouTube PubFig Trojan Trojan Face Square Watermark 13

Backdoor Detection Performance (3/3) • Q3: What is the trigger used by the backdoor? • Both triggers fire similar neurons • Reversed trigger is more compact Badnets : visually similar Trojan : not similar Injected Trigger Reversed Trigger YouTube Trojan Trojan MNIST GTSRB PubFig Face Square Watermark 14

Brief Summary of Mitigation • Detect adversarial inputs Adversarial • Flag inputs with high activation on Inputs Detect and reject malicious neurons adversarial inputs • With 5% FPR, we achieve <1.63% FNR on BadNets models (<28.5% on Trojan models) Proactive Filter • Patch models via unlearning Patch • Train DNN to make correct prediction when Remove backdoor an input has the reversed trigger • Reduce attack success rate to <6.70% Robust Infected DNN with <3.60% drop of accuracy 15

One More Thing • Many other interesting results in the paper • More complex patterns? • Multiple infected labels? • What if a label is infected with not just one backdoor? • Code is available on github.com/bolunwang/backdoor 16

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in - PowerPoint PPT Presentation

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath , Haitao Zheng, Ben Y. Zhao University of Chicago, UC Santa Barbara, Virginia Tech

Upper and Lower Bounds for Weak Backdoor Set Detection Neeldhara Misra, Sebastian Ordyniak ,

Breakthrough silicon scanning discovers backdoor in military chip Sergei Skorobogatov,

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Renew A 14-Day Cleanse Supporting a simple, safe and effective path to improved health and

The Preservation and Cleanse of Lithic Cultural Relics with Mineralization Method in Hangzhou

We offer a complete Natural & Organic skin care range to cleanse, hydrate, moisturise,

Mitigating Seabird Mitigating Seabird Interactions with Interactions with Trawl Nets Trawl

MITIGATING RISK MITIGATING RISK IN GIFT CARD SALES IN GIFT CARD SALES March 2019 MEET

Mitigating Geomagnetic Induced Currents Using Surge Arresters ALBERTO RAMIREZ MITIGATING

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

The Bicho An Advanced Car Backdoor Maker by @UnaPibaGeek and @holesec Who we are? And Where

A Solder-Defined Computer Architecture for Backdoor and Malware Resistance Examinee: Marc W.

Librarian presentation on Data Management for the CSCRS Below is a screen shot of the backdoor

True2F: Backdoor-resistant authentication tokens Emma Dauterman , Henry Corrigan-Gibbs, David

How to Backdoor Invulnerable Code Josh Schwartz, Director of Offensive Security @Salesforce

August 4, 1997: Skynet goes online August 29, 1997, 2:14am ET: Skynet gains consciousness

General-Purpose Image Forensics Using Patch Likelihood under Image Statistical Models The 7th

Auxiliary Processing swak4Foam and PyFoam Bruno Santos and Nelson Marques BlueCAPE,

Wireless data transfer with mm-waves for future tracking detectors Daniel Pelikan Uppsala

XenRT XenSource XenSources Xen testing infrastructure Xen testing infrastructure James

Automated Repair of Layout Cross Browser Issues Using Search-Based Techniques Sonal Mahajan * ,

Art Build: Unit Tests and CMake Tools Ben Morgan LBNE Code LArSoft LArCore LArFoo LArBar ...

Impacting Kids Who are Angry, Withdrawn, or Superficial Every kid needs someone to run to

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in - PowerPoint PPT Presentation

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks Bolun Wang*, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath , Haitao Zheng, Ben Y. Zhao University of Chicago, *UC Santa Barbara, Virginia Tech

Upper and Lower Bounds for Weak Backdoor Set Detection Neeldhara Misra, Sebastian Ordyniak ,

Breakthrough silicon scanning discovers backdoor in military chip Sergei Skorobogatov,

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Renew A 14-Day Cleanse Supporting a simple, safe and effective path to improved health and

The Preservation and Cleanse of Lithic Cultural Relics with Mineralization Method in Hangzhou

We offer a complete Natural &amp; Organic skin care range to cleanse, hydrate, moisturise,

Mitigating Seabird Mitigating Seabird Interactions with Interactions with Trawl Nets Trawl

MITIGATING RISK MITIGATING RISK IN GIFT CARD SALES IN GIFT CARD SALES March 2019 MEET

Mitigating Geomagnetic Induced Currents Using Surge Arresters ALBERTO RAMIREZ MITIGATING

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

The Bicho An Advanced Car Backdoor Maker by @UnaPibaGeek and @holesec Who we are? And Where

A Solder-Defined Computer Architecture for Backdoor and Malware Resistance Examinee: Marc W.

Librarian presentation on Data Management for the CSCRS Below is a screen shot of the backdoor

True2F: Backdoor-resistant authentication tokens Emma Dauterman , Henry Corrigan-Gibbs, David

How to Backdoor Invulnerable Code Josh Schwartz, Director of Offensive Security @Salesforce

August 4, 1997: Skynet goes online August 29, 1997, 2:14am ET: Skynet gains consciousness

General-Purpose Image Forensics Using Patch Likelihood under Image Statistical Models The 7th

Auxiliary Processing swak4Foam and PyFoam Bruno Santos and Nelson Marques BlueCAPE,

Wireless data transfer with mm-waves for future tracking detectors Daniel Pelikan Uppsala

XenRT XenSource XenSources Xen testing infrastructure Xen testing infrastructure James

Automated Repair of Layout Cross Browser Issues Using Search-Based Techniques Sonal Mahajan * ,

Art Build: Unit Tests and CMake Tools Ben Morgan LBNE Code LArSoft LArCore LArFoo LArBar ...

Impacting Kids Who are Angry, Withdrawn, or Superficial Every kid needs someone to run to

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath , Haitao Zheng, Ben Y. Zhao University of Chicago, UC Santa Barbara, Virginia Tech

We offer a complete Natural & Organic skin care range to cleanse, hydrate, moisturise,