Slalom: Fast, Verifiable and Private Execution of Neural Networks - PowerPoint PPT Presentation

Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware Florian Tramèr (joint work with Dan Boneh) Intel, Santa Clara – August 30 th 2018

Trusted execution of ML: 3 motivating scenarios 1. Outsourced ML Data Privacy Integrity - Model “downgrade” - Disparate impact - Other malicious tampering

Trusted execution of ML: 3 motivating scenarios 2. Federated Learning Integrity Data privacy Poison model updates

4 Trusted execution of ML: 3 motivating scenarios 3. Trojaned hardware (Verifiable ASICs model, Wahby et al.) Integrity

Solutions • Cryptography 1. Outsourced ML : FHE, MPC, (ZK) proof systems 2. Federated learning : no countermeasure for poisoning… 3. Trojaned hardware : some root of trust is needed • Trusted Execution Environments (TEEs) 1. Outsourced ML : isolated enclaves 2. Federated learning : trusted sensors + isolated enclaves 3. Trojaned hardware: fully trusted (but possibly slow) hardware

Trusted Execution: At what cost? • Trusted ASICs (Wahby et al.): ~10 8 � worse than SOTA • Intel SGX: VGG16 Inference 400 350 350 300 Paging at Images / sec 250 ~90MB 200 150 100 50 1 0 GPU SGX GPU: Nvidia TITAN XP SGX: Intel Core i7-6700 Skylake Single Core @ 3.40GHz https://medium.com/@danny_harnik/impressions-of-intel-sgx-performance-22442093595a

“How do we efficiently leverage TEEs for secure machine learning computations?” Idea: outsource work to collocated , faster but untrusted device and verify results x TEE F(x), proof Computations Required gap Privacy Verifiable ASICs Arithmetic circuits ~ 8 orders of No (Wahby et al., 2016) magnitude Slalom DNN inference ~ 1-2 orders “Yes”

Goal + threat model The model is known to the Adversary controls the rest adversary of the software / hardware (but not necessarily to the client) stack TEE User has secure communication channel with TEE Goal: Efficiently run DNN inference F(x) - Integrity : User obtains F(x) or aborts - Privacy : Adversary learns nothing about x

Bottlenecks in deep neural networks non linear stuff (cheap) MATRIX MULTIPLICATION ~ 97% VGG16 Inference on 1 CPU core

Outsourcing matrix multiplication: Freivald’s algorithm X ∈ " n ⨉ n , W ∈ " n ⨉ n Input: DNN weights. Fixed at inference time Direct Compute: Z = X ∙ W ≈ n 3 multiplications or O(n 2.81 ) with Strassen Outsource + Verify: • Sample r ← " n uniformly at random Z ∙ r = ? X ∙ (W ∙ r) • Check: • Complexity: ≈ 3n 2 multiplications • Soundness: 1 / | " | (boost by repeating)

Freivald variants for arbitrary linear operators z = F(x) = x ∙ A Linear operator: Matrix of size |x| × |z| Vector of size |z| Vector of size |x| Batched verification: Compute: [z 1 … z B ] = F ( [x 1 … x B ] ) ⇒ B∙cost(F) mults Freivald: r T ∙ [z 1 … z B ] = ? F ( r T ∙ [x 1 … x B ] ) ⇒ B∙(|x|+|z|) + cost(F) mults With precomputation: Precompute: A’ = A ∙ r = ( ∇ x F)(r) ⟨ z , r ⟩ = ? ⟨ x , A’ ⟩ ⇒ |x| + |z| mults Freivald: 2 inner products!

Handling convolutions VGG16 • K = 3 • 3 ≤ C ≤ 512 • 64 ≤ D ≤ 512 14 2 ≤ N ≤ 224 2 • Operation Multiplications Compute [z 1 … z B ] = im2col([x 1 … x B ]) * W B∙H∙W∙K 2 ∙C∙D r 1T * [z 1 … z B ] * r 2 = ? Batched verify B∙H∙W∙D + B∙H∙W∙C + im2col(r 1 * ([x 1 … x B ]) * (W * r 2 ) K 2 ∙C∙D + H∙W∙K 2 ∙C ⟨ z, r ⟩ = ? ⟨ ( ∇ x F)(r), x ⟩ Preprocessing B∙H∙W∙D + B∙H∙W∙C

Preserving privacy • Offline precomputation + online blinding Offline: Precompute and store R, R ∙ W X TEE X ∙ W

Preserving privacy • Offline precomputation + online blinding Online: “one-time-pad” over ! Offline: Precompute and store R, R ∙ W X+R TEE (X+R) ∙ W Online: Unblind using R ∙ W • Secret sharing? X+R Can these devices be TEE “collocated” yet “non-colluding” ? R

Slalom Summary Precompute and store TEE (R i , R i ∙ W i ) X 1 + R 1 Z 1 = (X 1 + R 1 ) ∙ W 1 1. Z 1 = Z 1 – R 1 W 1 2. Freivald check for (X 1 , W 1 , Z 1 ) X 2 + R 2 3. X 2 = σ(Z 1 ) Z 2 = (X 2 + R 2 ) ∙ W 2 Arbitrary non-linearity …

Slalom (some details) Quantization: • DNNs are typically trained / evaluated in floating point Freivald / blinding require working over a ring/field ! • • Quantize inputs & weights and work mod p (p < 2 24 ) Integrity checks: • Eval DNN on fast device and store inputs/outputs of all linear ops ⟹ close to no prover overhead Sample r from ! and do Freivald check in double precision • ⟹ verifier complexity is at least |x| + |z| double muls per linear layer Blinding: • Store unblinding factors R∙W encrypted in untrusted memory • In online phase, decrypt (and authenticate) R∙W to unblind

Design & Evaluation TEE Implementation • TEE: Intel SGX ”Desktop” CPU (single thread) • Untrusted device: Nvidia Tesla GPU • Port of the Eigen linear algebra C++ library to SGX (used in e.g., TensorFlow) Workloads: • Microbenchmarks (see paper) • VGG16 (“beefy” canonical feedforward neural network) • MobileNet (resource efficient DNN tailored for low-compute devices) • Variant 1: standard MobileNet (see paper) • Variant 2: No intermediate ReLU in separable convolutions (this talk)

Verifiable inference MobileNet’s weights are only ~10MB so they fit in the SGX cache VGG16 MobileNet 25 120 97.1 19.6 100 20 Images / sec 80 15 60 10 40 30 15.9 5 20 1.7 1 0 0 Compute Verify Verify with Compute Verify Verify with preproc preproc VGG16 weights take 500MB Difficult to get faster Preprocessed weights W∙r so SGX has to page weights batched verification due to take up less memory and in and out of memory SGX memory limits enable faster checks! => ~2-3x slowdown

Verifiable and private inference VGG16 MobileNet 25 120 97.1 19.6 Images / sec 100 20 80 80 13 15 54.9 10.2 60 10 40 15.9 5 1 20 0 0 e y y h e y y h t t c t t c t t i u o u i o r a r a g p v b p g b v e m + e i m i + r r t e t e n p p o o n c c + + C I C i + r + r e e u u e e c c o o c c r r s r u r s u u t u t o u o u o o s s O O s s t t u t t u u u O O O O Extra Costs - GPU has to operate in double precision - Decrypt all unblinding factors R∙W (AES-GCM) - Regenerate all blinding factors R (PRG using AES)

Summary • Large savings (6x – 20x) in outsourcing DNN inference while preserving integrity • Sufficient for some use-cases! • More modest savings (3.5x – 10x) with input privacy • Requires preprocessing

Open questions • What other problems are (concretely) easier to verify than to compute? • All NP complete problems (are those often outsourced?) • What about something in P? • Convex optimization • Other uses of matrix multiplication • Many graph problems (e.g., perfect matching) • What about Slalom for verifiable / private training? • Quantization at training time is hard • Weights change so we can’t preprocess weights for Freivald’s check • We assume the model is known to the adversary (e.g., the cloud provider)

Slalom: Fast, Verifiable and Private Execution of Neural Networks - PowerPoint PPT Presentation

Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware Florian Tramr (joint work with Dan Boneh) Intel, Santa Clara August 30 th 2018 Trusted execution of ML: 3 motivating scenarios 1. Outsourced ML Data

Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware F LORIAN T

FOR SINGLE POLE SLALOM & SINGLE GATE GIANT SLALOM* THE CHIEF GATE JUDGE

VERIFIABLE DELAY FUNCTIONS Benjamin Wesolowski VERIFIABLE DELAY FUNCTIONS How to slow things

IWWF WATERSKI & WAKEBOARD WORLD CUP WAKEBOARD CABLE WAKEBOARD WATERSKI (SLALOM TRICK

Verifiable Random Functions and Verifiable Delay Functions Caleb Smith University of

Generating Verifiable Java Code from Verified PVS Specifications NFM2012 Generating Verifiable

Delegation with (nearly) optimal time/space overhead Justin Holmgren Ron Rothblum MIT MIT

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Verifiable Homomorphic Oblivious Transfer and Private Equality Test Helger Lipmaa Helsinki

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Presented by Scott Perry - Slalom Consulting Introductions Session Objectives Overview

Slalom modelling obstacle avoidance in skiing Naveen Pai (21) Motivation Properties of

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Unique Aggregate Signatures with Applications to Distributed Verifiable Random Functions

CS/ECE 6710 Tool Suite Verilog sim Synthesis and Place & Route Synopsys Behavioral Design

Objectives Combinator Parsing Show how to build complex parsers by composing simpler parsers.

Internet Media-on-Demand: The Real-Time Streaming Protocol Henning Schulzrinne Dept. of Computer

Webinar of sustainability into Award-Winning Sustainability our work through the Part 1

2009: Historic Truck Crash Declines September 29, 2010 Ralph Craft, Ph.D. Analysis Division

Math 140 Sampling Distributions. Distribution of summary statistics obtained from taking

Tractor & Machinery Operations: Science of Stability Photo Credit: Penn State Agricultural

Teen een Dr Driver Safety ty Topic Call #3 December 19, 2017 Strategy Team Updates Call (866)

Slalom: Fast, Verifiable and Private Execution of Neural Networks - PowerPoint PPT Presentation

Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware Florian Tramr (joint work with Dan Boneh) Intel, Santa Clara August 30 th 2018 Trusted execution of ML: 3 motivating scenarios 1. Outsourced ML Data

Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware F LORIAN T

FOR SINGLE POLE SLALOM &amp; SINGLE GATE GIANT SLALOM* THE CHIEF GATE JUDGE

VERIFIABLE DELAY FUNCTIONS Benjamin Wesolowski VERIFIABLE DELAY FUNCTIONS How to slow things

IWWF WATERSKI &amp; WAKEBOARD WORLD CUP WAKEBOARD CABLE WAKEBOARD WATERSKI (SLALOM TRICK

Verifiable Random Functions and Verifiable Delay Functions Caleb Smith University of

Generating Verifiable Java Code from Verified PVS Specifications NFM2012 Generating Verifiable

Delegation with (nearly) optimal time/space overhead Justin Holmgren Ron Rothblum MIT MIT

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Verifiable Homomorphic Oblivious Transfer and Private Equality Test Helger Lipmaa Helsinki

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Presented by Scott Perry - Slalom Consulting Introductions Session Objectives Overview

Slalom modelling obstacle avoidance in skiing Naveen Pai (21) Motivation Properties of

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Unique Aggregate Signatures with Applications to Distributed Verifiable Random Functions

CS/ECE 6710 Tool Suite Verilog sim Synthesis and Place &amp; Route Synopsys Behavioral Design

Objectives Combinator Parsing Show how to build complex parsers by composing simpler parsers.

Internet Media-on-Demand: The Real-Time Streaming Protocol Henning Schulzrinne Dept. of Computer

Webinar of sustainability into Award-Winning Sustainability our work through the Part 1

2009: Historic Truck Crash Declines September 29, 2010 Ralph Craft, Ph.D. Analysis Division

Math 140 Sampling Distributions. Distribution of summary statistics obtained from taking

Tractor &amp; Machinery Operations: Science of Stability Photo Credit: Penn State Agricultural

Teen een Dr Driver Safety ty Topic Call #3 December 19, 2017 Strategy Team Updates Call (866)

FOR SINGLE POLE SLALOM & SINGLE GATE GIANT SLALOM* THE CHIEF GATE JUDGE

IWWF WATERSKI & WAKEBOARD WORLD CUP WAKEBOARD CABLE WAKEBOARD WATERSKI (SLALOM TRICK

CS/ECE 6710 Tool Suite Verilog sim Synthesis and Place & Route Synopsys Behavioral Design

Tractor & Machinery Operations: Science of Stability Photo Credit: Penn State Agricultural