CloudLeak: DNN Model Extractions from Commercial MLaaSPlatforms
Yun-Yun Tsai & Tsung-Yi Ho National Tsing Hua University
#BHUSA @BLACKHATEVENTS
Commercial MLaaSPlatforms Yun-Yun Tsai & Tsung-Yi Ho National - - PowerPoint PPT Presentation
CloudLeak: DNN Model Extractions from Commercial MLaaSPlatforms Yun-Yun Tsai & Tsung-Yi Ho National Tsing Hua University #BHUSA @BLACKHATEVENTS Who We Are Yun-Yun Tsai Research Assistant Department of Computer Science National Tsing Hua
Yun-Yun Tsai & Tsung-Yi Ho National Tsing Hua University
#BHUSA @BLACKHATEVENTS
Yun-Yun Tsai Research Assistant Department of Computer Science National Tsing Hua University Education: National Tsing Hua University, M.S in Computer Science. National Tsing Hua University, B.S in Computer Science. Research Interests: Adversarial Machine Learning Trustworthy AI
1.
Tsung-Yi Ho, PhD Professor Department of Computer Science National Tsing Hua University Program Director AI Innovation Program Ministry of Science and Technology, Taiwan Research Interests: Hardware and Circuit Security Trustworthy AI Design Automation for Emerging Technologies
2.
3.
4.
29.
“Perceptron” “Multi-Layer Perceptron” “Deep Convolutional Neural Network”
61M 138M 7M 60M 8 16 22 152
40 80 120 160 1 100 10000 1000000 100000000 1E+10
AlexNet VGG-16 GoogLeNet ResNet-152
# Layers # Parameters Revolution of DNN Structure
Parameters Layers
DNN based systems are widely used in various applications: 6.
being accepted as a reliable solution to various applications.
Deploy Experiment Code your model Train & Test Deploy as a web service Prepare Prepare your data
7.
Training API Prediction API Black-box Dataset Inputs Outputs $$$ per query Goal 1: Rich Prediction API Goal 2: Model Confidentiality
Sensitive Data User Suppliers
8.
Services Products and Solutions Customization Function Black-box Model Types Monetize Confidence Scores Microsoft Custom Vision
√
Traffic Recognition
√
NN
√ √
Custom Vision
√
Flower Recognition
√
NN
√ √
Face++ Emotion Recognition API
×
Face Emotion Verification
√
NN
√ √
IBM Watson Visual Recognition
√
Face Recognition
√
NN
√ √
Google AutoML Vision
√
Flower Recognition
√
NN
√ √
Clarifai Not Safe for Work (NSFW)
×
Offensive Content Moderation
√
NN
√ √
9.
Training API Prediction API Black-box Dataset Inputs Outputs $$$ per query Goal 1: Rich Prediction API Goal 2: Model Confidentiality
Sensitive Data User Suppliers
10.
Training API Prediction API Black-box Dataset Inputs Outputs $$$ per query Goal 1: Rich Prediction API Goal 2: Model Confidentiality
Sensitive Data Suppliers
11.
“adversarial perturbation” “adversarial example” “source example” “predict label” 12.
AI/ML system
Tony Stark Chris Evans
Source Perturbation Guide Adversarial Source Perturbation Guide Adversarial
Sabour et al, 2016 Carlini et al, 2017
Source Adversarial Source Adversarial
13.
*No access to model internal information in the black- box setting Data Model Inference Inference Training Phase Testing Phase AI/ML system 14.
equivalent local model of the target model by querying the labels and confidence scores of model predictions.
Training Prediction API
Private data input Supplier Black box
16.
17.
Proposed Attacks Parameter Size Queries Accuracy Black-box? Stealing Cost
~ 45k ~ 102k High
√
Low Juuti (EuroS&P’19) ~ 10M ~ 111k High
√
~ 200M ~ 66k High
√
High Papernot (AsiaCCS’17) ~ 100M ~ 7k Low
√
A high-level illustration of the adversarial example generation
Source example
( ) f x
( ) f x
19.
substitute model with the performance similar as the black-box model.
A high-level illustration of the adversarial example generation
Source example Medium-confidence benign example Minimum-confidence benign example Minimum-confidence adversarial example Medium-confidence adversarial example Maximum-confidence adversarial example
( ) f x
( ) f x
20.
substitute model with the performance similar as the black-box model.
( ) f x
( ) f x
Illustration of the margin-based uncertainty sampling strategy.
“Useful examples”
Source example Medium-confidence benign example Minimum-confidence benign example Minimum-confidence adversarial example Medium-confidence adversarial example Maximum-confidence adversarial example 21.
confidence score).
box-constrained L-BFGS for finding a minimum of the loss function. minimize 𝑒 𝑦𝑡
′, 𝑦𝑡 + 𝛽 ∙ 𝑚𝑝𝑡𝑡𝑔,𝑚 𝑦𝑡 ′ such that 𝑦𝑡 ′ ∈ [0,1]𝑜
𝑚𝑝𝑡𝑡𝑔,𝑚 𝑦𝑡
′ = max(𝐸 ∅𝐿 𝑦𝑡 ′ , ∅𝐿 𝑦𝑢
− 𝐸 ∅𝐿 𝑦𝑡
′ , ∅𝐿 𝑦𝑡
+ 𝑁, 0) For the triplet loss 𝑚𝑝𝑡𝑡𝑔,𝑚 𝑦𝑡
′ , we formally define it as:
22.
(a) Source image (b) Adversarial perturbation (c) Guide Image (d) Feature Extractor (e) Salient Features
𝑦𝑡 𝜀 𝑦𝑢 𝑎(𝑦𝑡 + 𝜀) 𝑎(𝑦𝑢)
(f) Box-constrained L-BFGS (1) Input an image and extract the corresponding n-th layer feature mapping using the feature extractor (a)-(d); (2) Compute the class salience map to decide which points of feature mapping should be modified (e); (3) Search for the minimum perturbation that satisfies the optimization formula (f).
23.
(a) Source image (b) Adversarial perturbation (c) Guide Image (d) Feature Extractor (e) Salient Features
𝑦𝑡 𝜀 𝑦𝑢 𝑎(𝑦𝑡 + 𝜀) 𝑎(𝑦𝑢)
(f) Box-constrained L-BFGS (1) Input an image and extract the corresponding n-th layer feature mapping using the feature extractor (a)-(d); (2) Compute the class salience map to decide which points of feature mapping should be modified (e); (3) Search for the minimum perturbation that satisfies the optimization formula (f).
24.
(a) Source image (b) Adversarial perturbation (c) Guide Image (d) Feature Extractor (e) Salient Features
𝑦𝑡 𝜀 𝑦𝑢 𝑎(𝑦𝑡 + 𝜀) 𝑎(𝑦𝑢)
(f) Box-constrained L-BFGS (1) Input an image and extract the corresponding n-th layer feature mapping using the feature extractor (a)-(d); (2) Compute the class salience map to decide which points of feature mapping should be modified (e); (3) Search for the minimum perturbation that satisfies the optimization formula (f).
25.
Source Guide Adversarial Source Guide Adversarial Source Guide Adversarial Stop: 0.99 √ Limited Speed: 0.98 √ Limited Speed: 0.01 ×
26.
Model Zoo.
Model Zoo (AlexNet, VGGNet, ResNet) Malicious Examples (PGD, C&W, FeatureFool)
Inputs Outputs Search MLaaS Adversary Candidate Library Illustration of the proposed MLaaS model stealing attacks
27.
DB
?
(a) Unlabeled Synthetic Datatset
Genuine Domain Malicious Domain
(b) MLaaS Query (c) Synthetic Dataset with Stolen Labels (d) Feature Transfer
Reused Layers Retrained Layers Layer copied from Teacher Layer trained by Student (Adversary)
(e) Prediction Boundary Label 28.
29.
1) Choose a more complex/relevant network, e.g., VGGFace 2) Generate/Collect images relevant to the classification problem in source domain and in problem domain (relevant queries) 3) MLaaS query 4) Local model training based on the cloud query results
Architecture Choice for stealing Face++ Emotion Classification API (A = 0.68k; B = 1.36k; C = 2.00k) 30.
Comparison of performance on the victim model (Microsoft) and their local substitute models. Service Model Dataset Price ($) Queries RS CW FF Microsoft Traffic 0.43k 10.21% (13.10x) 12.10% (15.53x) 15.96% (20.48x) 0.43 1.29k 45.30% (58.13x) 61.25% (79.60x) 66.91% (85.86x) 1.29 2.15k 70.03% (89.86x) 74.94% (96.16x) 76.05% (97.63x) 2.15 Flower 0.51k 26.27% (28.97x) 29.41% (32.43x) 31.86% (35.13x) 1.53 1.53k 64.02% (70.59x) 69.22% (76.33x) 72.35% (79.78x) 4.59 2.55k 79.22% (87.35x) 89.20% (98.36x) 88.14% (97.19x) 7.65 31. *Orig Acc. (77.93%) *Orig Acc. (90.69%)
A Comparison to prior works.
Proposed Attacks Parameter Size Queries Accuracy Black-box? Stealing Cost
~ 45k ~ 102k High
√
Low Juuti (EuroS&P’19) ~10M ~ 111k High
√
~ 200M ~66k High
√
High Papernot (AsiaCCS’17) ~ 100M ~7k Low
√
~ 200M ~3k High
√
Low 32.
33.
M from 0.1 𝐸 to 0.8 𝐸.
small.
Model (𝜺 value) Queries made until detection PGD CW FA FF 𝑁 = 0.8𝐸 𝑁 = 0.5𝐸 𝑁 = 0.1𝐸 Traffic (𝜀 = 0.92) missed missed missed missed 150 130 Traffic (𝜀 = 0.97) 110 110 110 110 110 110 Flower (𝜀 = 0.87) 110 missed 220 missed 290 140 Flower (𝜀 = 0.90) 110 340 220 350 120 130 Flower (𝜀 = 0.94) 110 340 220 350 120 130 34.
35.
36.