Commercial MLaaSPlatforms Yun-Yun Tsai & Tsung-Yi Ho National - - PowerPoint PPT Presentation

commercial mlaasplatforms
SMART_READER_LITE
LIVE PREVIEW

Commercial MLaaSPlatforms Yun-Yun Tsai & Tsung-Yi Ho National - - PowerPoint PPT Presentation

CloudLeak: DNN Model Extractions from Commercial MLaaSPlatforms Yun-Yun Tsai & Tsung-Yi Ho National Tsing Hua University #BHUSA @BLACKHATEVENTS Who We Are Yun-Yun Tsai Research Assistant Department of Computer Science National Tsing Hua


slide-1
SLIDE 1

CloudLeak: DNN Model Extractions from Commercial MLaaSPlatforms

Yun-Yun Tsai & Tsung-Yi Ho National Tsing Hua University

#BHUSA @BLACKHATEVENTS

slide-2
SLIDE 2

Who We Are

Yun-Yun Tsai Research Assistant Department of Computer Science National Tsing Hua University Education: National Tsing Hua University, M.S in Computer Science. National Tsing Hua University, B.S in Computer Science. Research Interests: Adversarial Machine Learning Trustworthy AI

1.

slide-3
SLIDE 3

Who We Are

Tsung-Yi Ho, PhD Professor Department of Computer Science National Tsing Hua University Program Director AI Innovation Program Ministry of Science and Technology, Taiwan Research Interests: Hardware and Circuit Security Trustworthy AI Design Automation for Emerging Technologies

2.

slide-4
SLIDE 4

A Preliminary paper published on NDSS 2020

  • CloudLeak: Large-Scale Deep Learning Models Stealing Through

Adversarial Examples (paper link)

3.

slide-5
SLIDE 5

Outline

  • Background
  • MLaaS in Cloud
  • Overview of Adversarial Attack
  • Adversarial Examples based Model Stealing
  • Adversarial Active Learning
  • FeatureFool
  • MLaaS Model Stealing Attacks
  • Case Study & Experimental Results
  • Commercial APIs hosted by Microsoft, Face++, IBM, Google and Clarifai
  • Defenses
  • Conclusions

4.

slide-6
SLIDE 6

Background & Motivation

29.

slide-7
SLIDE 7

Success of DNN

“Perceptron” “Multi-Layer Perceptron” “Deep Convolutional Neural Network”

61M 138M 7M 60M 8 16 22 152

40 80 120 160 1 100 10000 1000000 100000000 1E+10

AlexNet VGG-16 GoogLeNet ResNet-152

# Layers # Parameters Revolution of DNN Structure

Parameters Layers

DNN based systems are widely used in various applications: 6.

slide-8
SLIDE 8

Machine Learning on Cloud

  • Machine learning as a service (MLaaS) provided by cloud providers is gradually

being accepted as a reliable solution to various applications.

Deploy Experiment Code your model Train & Test Deploy as a web service Prepare Prepare your data

7.

slide-9
SLIDE 9

Machine Learning as a Service

Training API Prediction API Black-box Dataset Inputs Outputs $$$ per query Goal 1: Rich Prediction API Goal 2: Model Confidentiality

Overview of MLaaS Working Flow

Sensitive Data User Suppliers

8.

slide-10
SLIDE 10

Services Products and Solutions Customization Function Black-box Model Types Monetize Confidence Scores Microsoft Custom Vision

Traffic Recognition

NN

√ √

Custom Vision

Flower Recognition

NN

√ √

Face++ Emotion Recognition API

×

Face Emotion Verification

NN

√ √

IBM Watson Visual Recognition

Face Recognition

NN

√ √

Google AutoML Vision

Flower Recognition

NN

√ √

Clarifai Not Safe for Work (NSFW)

×

Offensive Content Moderation

NN

√ √

Property of MLaaS

9.

slide-11
SLIDE 11

Motivation: Security on MLaaS

Training API Prediction API Black-box Dataset Inputs Outputs $$$ per query Goal 1: Rich Prediction API Goal 2: Model Confidentiality

Overview of MLaaS Working Flow

Sensitive Data User Suppliers

10.

slide-12
SLIDE 12

Motivation: Security on MLaaS

Training API Prediction API Black-box Dataset Inputs Outputs $$$ per query Goal 1: Rich Prediction API Goal 2: Model Confidentiality

Overview of MLaaS Working Flow

Sensitive Data Suppliers

11.

slide-13
SLIDE 13

Adversarial Examples in DNN

  • Adversarial examples are model inputs generated by an adversary to

fool deep learning models.

+ =

“adversarial perturbation” “adversarial example” “source example” “predict label” 12.

AI/ML system

Tony Stark Chris Evans

?

slide-14
SLIDE 14

Adversarial Examples in DNN

  • Non-Feature-based
  • Projected Gradient Descent (PGD) attack
  • C&W attack
  • Zeroth Order Optimization (ZOO) attack
  • Feature-based
  • Feature Adversary attack (FA)
  • FeatureFool

Source Perturbation Guide Adversarial Source Perturbation Guide Adversarial

Sabour et al, 2016 Carlini et al, 2017

Source Adversarial Source Adversarial

13.

slide-15
SLIDE 15

Holistic View of Adversarial Attack

*No access to model internal information in the black- box setting Data Model Inference Inference Training Phase Testing Phase AI/ML system 14.

slide-16
SLIDE 16
  • We aim to accurately retrain an

equivalent local model of the target model by querying the labels and confidence scores of model predictions.

Training Prediction API

Private data input Supplier Black box

Our Goal

16.

slide-17
SLIDE 17

Adversarial Example based Model Stealing Attack

17.

slide-18
SLIDE 18

Model Stealing Attacks

  • Various model stealing attacks have been developed
  • None of them can achieve a good tradeoffs among query counts,

accuracy, cost, etc.

Proposed Attacks Parameter Size Queries Accuracy Black-box? Stealing Cost

  • F. Tramer (USENIX’16)

~ 45k ~ 102k High

Low Juuti (EuroS&P’19) ~ 10M ~ 111k High

  • Correia-Silva (IJCNN’18)

~ 200M ~ 66k High

High Papernot (AsiaCCS’17) ~ 100M ~ 7k Low

  • 18.
slide-19
SLIDE 19

Adversarial Active Learning

A high-level illustration of the adversarial example generation

Source example

( ) f x 

( ) f x 

19.

slide-20
SLIDE 20
  • We gather a set of “useful examples” to train a

substitute model with the performance similar as the black-box model.

Adversarial Active Learning

A high-level illustration of the adversarial example generation

Source example Medium-confidence benign example Minimum-confidence benign example Minimum-confidence adversarial example Medium-confidence adversarial example Maximum-confidence adversarial example

( ) f x 

( ) f x 

20.

slide-21
SLIDE 21
  • We gather a set of “useful examples” to train a

substitute model with the performance similar as the black-box model.

Adversarial Active Learning (cont.)

( ) f x 

( ) f x 

Illustration of the margin-based uncertainty sampling strategy.

“Useful examples”

Source example Medium-confidence benign example Minimum-confidence benign example Minimum-confidence adversarial example Medium-confidence adversarial example Maximum-confidence adversarial example 21.

slide-22
SLIDE 22
  • To reduce the scale of the perturbation, we further propose a feature-

based attack to generate more robust adversarial examples.

  • Attack goal: Low confidence score for true class (we use 𝑁 to control the

confidence score).

  • In order to solve the reformulated optimization problem above, we apply the

box-constrained L-BFGS for finding a minimum of the loss function. minimize 𝑒 𝑦𝑡

′, 𝑦𝑡 + 𝛽 ∙ 𝑚𝑝𝑡𝑡𝑔,𝑚 𝑦𝑡 ′ such that 𝑦𝑡 ′ ∈ [0,1]𝑜

𝑚𝑝𝑡𝑡𝑔,𝑚 𝑦𝑡

′ = max(𝐸 ∅𝐿 𝑦𝑡 ′ , ∅𝐿 𝑦𝑢

− 𝐸 ∅𝐿 𝑦𝑡

′ , ∅𝐿 𝑦𝑡

+ 𝑁, 0) For the triplet loss 𝑚𝑝𝑡𝑡𝑔,𝑚 𝑦𝑡

′ , we formally define it as:

FeatureFool: Margin-based Adv. Example

22.

slide-23
SLIDE 23

(a) Source image (b) Adversarial perturbation (c) Guide Image (d) Feature Extractor (e) Salient Features

+

𝑦𝑡 𝜀 𝑦𝑢 𝑎(𝑦𝑡 + 𝜀) 𝑎(𝑦𝑢)

(f) Box-constrained L-BFGS (1) Input an image and extract the corresponding n-th layer feature mapping using the feature extractor (a)-(d); (2) Compute the class salience map to decide which points of feature mapping should be modified (e); (3) Search for the minimum perturbation that satisfies the optimization formula (f).

Overview of FeatureFool Attack

1.

23.

slide-24
SLIDE 24

(a) Source image (b) Adversarial perturbation (c) Guide Image (d) Feature Extractor (e) Salient Features

+

𝑦𝑡 𝜀 𝑦𝑢 𝑎(𝑦𝑡 + 𝜀) 𝑎(𝑦𝑢)

(f) Box-constrained L-BFGS (1) Input an image and extract the corresponding n-th layer feature mapping using the feature extractor (a)-(d); (2) Compute the class salience map to decide which points of feature mapping should be modified (e); (3) Search for the minimum perturbation that satisfies the optimization formula (f).

Overview of FeatureFool Attack

2.

24.

slide-25
SLIDE 25

(a) Source image (b) Adversarial perturbation (c) Guide Image (d) Feature Extractor (e) Salient Features

+

𝑦𝑡 𝜀 𝑦𝑢 𝑎(𝑦𝑡 + 𝜀) 𝑎(𝑦𝑢)

(f) Box-constrained L-BFGS (1) Input an image and extract the corresponding n-th layer feature mapping using the feature extractor (a)-(d); (2) Compute the class salience map to decide which points of feature mapping should be modified (e); (3) Search for the minimum perturbation that satisfies the optimization formula (f).

Overview of FeatureFool Attack

3.

25.

slide-26
SLIDE 26

FeatureFool: A New Adversarial Attack

Source Guide Adversarial Source Guide Adversarial Source Guide Adversarial Stop: 0.99 √ Limited Speed: 0.98 √ Limited Speed: 0.01 ×

26.

slide-27
SLIDE 27

MLaaS Model Stealing Attacks

  • Our attack approach
  • Use all adversarial examples to generate the malicious inputs;
  • Obtain input-output pairs by querying black-box APIs with malicious inputs;
  • Retrain the substitute models which are generally chosen from candidate

Model Zoo.

Model Zoo (AlexNet, VGGNet, ResNet) Malicious Examples (PGD, C&W, FeatureFool)

Inputs Outputs Search MLaaS Adversary Candidate Library Illustration of the proposed MLaaS model stealing attacks

27.

slide-28
SLIDE 28

MLaaS Model Stealing Attacks

  • Overview of the transfer framework for the model theft attack

DB

?

(a) Unlabeled Synthetic Datatset

Genuine Domain Malicious Domain

(b) MLaaS Query (c) Synthetic Dataset with Stolen Labels (d) Feature Transfer

Reused Layers Retrained Layers Layer copied from Teacher Layer trained by Student (Adversary)

(e) Prediction Boundary Label 28.

slide-29
SLIDE 29

Case Study & Experimental Results

29.

slide-30
SLIDE 30

Example: Emotion Classification

  • Procedure to extract a copy of

the Emotion Classification model

1) Choose a more complex/relevant network, e.g., VGGFace 2) Generate/Collect images relevant to the classification problem in source domain and in problem domain (relevant queries) 3) MLaaS query 4) Local model training based on the cloud query results

Architecture Choice for stealing Face++ Emotion Classification API (A = 0.68k; B = 1.36k; C = 2.00k) 30.

slide-31
SLIDE 31

Experimental Results

  • Adversarial perturbations result in a more successful transfer set.
  • In most cases, our FeatureFool method achieves the same level of

accuracy with fewer queries than other methods

Comparison of performance on the victim model (Microsoft) and their local substitute models. Service Model Dataset Price ($) Queries RS CW FF Microsoft Traffic 0.43k 10.21% (13.10x) 12.10% (15.53x) 15.96% (20.48x) 0.43 1.29k 45.30% (58.13x) 61.25% (79.60x) 66.91% (85.86x) 1.29 2.15k 70.03% (89.86x) 74.94% (96.16x) 76.05% (97.63x) 2.15 Flower 0.51k 26.27% (28.97x) 29.41% (32.43x) 31.86% (35.13x) 1.53 1.53k 64.02% (70.59x) 69.22% (76.33x) 72.35% (79.78x) 4.59 2.55k 79.22% (87.35x) 89.20% (98.36x) 88.14% (97.19x) 7.65 31. *Orig Acc. (77.93%) *Orig Acc. (90.69%)

slide-32
SLIDE 32

Comparison with Existing Attacks

  • Our attack framework can steal large-scale deep learning models with

high accuracy, few queries and low costs simultaneously.

  • The same trend appears while we use different transfer architectures

to steal black-box target model.

A Comparison to prior works.

Proposed Attacks Parameter Size Queries Accuracy Black-box? Stealing Cost

  • F. Tramer (USENIX’16)

~ 45k ~ 102k High

Low Juuti (EuroS&P’19) ~10M ~ 111k High

  • Correia-Silva (IJCNN’18)

~ 200M ~66k High

High Papernot (AsiaCCS’17) ~ 100M ~7k Low

  • Our Method

~ 200M ~3k High

Low 32.

slide-33
SLIDE 33

Defenses Mechanism against Model Stealing

33.

slide-34
SLIDE 34

Evading Defenses

  • PRADA (Protecting Against DNN Model Stealing Attacks)
  • Our attacks can easily bypass the defense by carefully selecting the parameter

M from 0.1 𝐸 to 0.8 𝐸.

  • Other types of adversarial attacks can also bypass the PRADA defense if 𝜀 is

small.

Model (𝜺 value) Queries made until detection PGD CW FA FF 𝑁 = 0.8𝐸 𝑁 = 0.5𝐸 𝑁 = 0.1𝐸 Traffic (𝜀 = 0.92) missed missed missed missed 150 130 Traffic (𝜀 = 0.97) 110 110 110 110 110 110 Flower (𝜀 = 0.87) 110 missed 220 missed 290 140 Flower (𝜀 = 0.90) 110 340 220 350 120 130 Flower (𝜀 = 0.94) 110 340 220 350 120 130 34.

slide-35
SLIDE 35

Conclusions

35.

slide-36
SLIDE 36

Conclusions

  • We propose a new adversarial attack method named featurefool.
  • We develop a novel adversarial example-based model stealing

attack targeting MLaaS in the cloud.

  • More effective defense mechanisms against the model stealing

attack will be developed to enhance the robustness of DNN based MLaaS.

36.

slide-37
SLIDE 37

Thanks for your listening Q&A