PINFER: PRIVACY-PRESERVING INFERENCE DPM 2019 Luxembourg Sep. 26, - PowerPoint PPT Presentation

Innovation Centre PINFER: PRIVACY-PRESERVING INFERENCE DPM 2019 Luxembourg � Sep. 26, 2019 Marc Joye Fabien Petitcolas

MACHINE LEARNING AS A SERVICE — GENERIC MODEL result Client Cloud (Server) 1 exchange of messages c 2019 OneSpan Innovation Centre 2 Innovation Centre

REQUIREMENTS AND SOLUTIONS Security requirements Proposed solutions Private evaluation for: 1 Linear regression 2 Logistic regression 3 Binary classification • The server learns nothing about the • Support Vector Machines (SVM) • requires a private comparison client’s input protocol (e.g., DGK+) • The server does not learn the output of the calculation 4 Neural networks • The client learns nothing about the ML • Sign or ReLU activation functions model • 1 interaction per layer c 2019 OneSpan Innovation Centre 3 Innovation Centre

LINEAR PREDICTION MODEL • Input 1 Server’s ML model: θ = ( θ 0 , . . . , θ d ) ∈ R d + 1 2 User’s feature vector: x = ( 1 , x 1 . . . . , x d ) ∈ { 1 } × R d • Output h θ ( x ) = g ( θ ⊺ x ) in many cases c 2019 OneSpan Innovation Centre 4 Innovation Centre

LINEAR PREDICTION MODEL — EVALUATION FUNCTION g Linear Regression [real-valued output] g = Id Logistic Regression [probability] exp( s ) g = σ where σ ( s ) = 1 +exp( s ) Linear Classification [binary decision] g = sign Rectified linear unit (ReLU) [neural networks] � 0 if s < 0 g ( s ) = otherwise s c 2019 OneSpan Innovation Centre 5 Innovation Centre

LINEAR PREDICTION MODEL WITH ENCRYPTION y = g ( θ ⊺ x ) Model evaluation: ˆ ( pk , sk ) Server ( θ ) Client ( x ) ⟦ x 1 ⟧ , . . . , ⟦ x d ⟧ , pk ❶ Compute ⟦ x ⟧ ❷ Compute ⟦ g ( θ ⊺ x ) ⟧ ⟦ g ( θ ⊺ x ) ⟧ ❸ Decrypt ⟦ g ( θ ⊺ x ) ⟧ y = g ( θ ⊺ x ) Set ˆ c 2019 OneSpan Innovation Centre 6 Innovation Centre

LINEARLY HOMOMORPHIC ENCRYPTION • We only require linearly homomorphic encryption: Enc pk ( m 1 ) ⊞ Enc pk ( m 2 ) = Enc pk ( m 1 + m 2 ) • NOT fully homomorphic encryption: Enc pk ( m 1 ) ⊞ Enc pk ( m 2 ) = Enc pk ( m 1 + m 2 ) Enc pk ( m 1 ) ⊡ Enc pk ( m 2 ) = Enc pk ( m 1 · m 2 ) • Benefits • Simpler implementation • Faster computation c 2019 OneSpan Innovation Centre 7 Innovation Centre

PRIVATE INNER PRODUCT • Since ⟦ · ⟧ is homomorphic ⟦ θ ⊺ x ⟧ = ⟦ θ 0 + � d ⟧ = ⟦ θ 0 ⟧ ⊞ ⟦ θ 1 x 1 ⟧ ⊞ · · · ⊞ ⟦ θ d x d ⟧ i = 1 θ i x i and, for 1 ≤ i ≤ d , := θ i ⊙ ⟦ x i ⟧ ⟦ θ i x i ⟧ = ⟦ x i ⟧ ⊞ · · · ⊞ ⟦ x i ⟧ � �� θ i times Example (Paillier’s cryptosystem) • ⟦ m ⟧ = ( 1 + N ) m r N mod N 2 • ⟦ m 1 + m 2 ⟧ = ⟦ m 1 ⟧ · ⟦ m 2 ⟧ mod N 2 • ⟦ m 1 − m 2 ⟧ = ⟦ m 1 ⟧ / ⟦ m 2 ⟧ mod N 2 • a ⊙ ⟦ m ⟧ = ⟦ m ⟧ a mod N 2 = ⇒ ⟦ θ ⊺ x ⟧ requires d exponentiations modulo N 2 c 2019 OneSpan Innovation Centre 8 Innovation Centre

IF EVALUATION FUNCTION g IS NON-LINEAR • g is non-linear but injective (e.g., σ ) • Server computes ⟦ θ ⊺ x ⟧ • Client obtains θ ⊺ x and simply applies g and learns no more (by definition: g ( a ) = g ( b ) = ⇒ a = b ) • g is non-linear and non-injective (e.g., sign, ReLU) • Use set of tools and tricks • DGK+ comparison protocol • Simple masking with a random value • Masking and scaling of inner product • Variant of oblivious transfer (two possible ciphers sent) • Dual setup • Server publishes pk S and ⦃ θ ⦄ s • Still one round of messages! c 2019 OneSpan Innovation Centre 9 Innovation Centre

NEURAL NETWORKS Bias θ ( l ) Weights j , 0 θ ( l ) x ( l − 1 ) Activation j , 1 1 Output function θ ( l ) x ( l − 1 ) Inputs j , 2 2 . Σ g ( l ) x ( l ) . . j j x ( l − 1 ) θ ( l ) d l − 1 j , d l − 1 c 2019 OneSpan Innovation Centre 10 Innovation Centre

NUMERICAL EXPERIMENTS • Implementation (not much optimised) • Python • Intel i7-4770, 3.4GHz • GMP library (power exponentiation) • Fixed precision (53 bits) • Parameters • Public datasets and randomly generated ones • Models with 30 to 7994 features • Key sizes: 1388 to 2440 bits • Message overhead proportional to: • Key size • Number of features (or number of bits in DGK+) • Number of layers (FFNN) c 2019 OneSpan Innovation Centre 11 Innovation Centre

MESSAGE OVERHEAD (kB) 1 Protocol Protocol step Size Linear regression Client sends: pk C , ⟦ x i ⟧ , 1 ≤ i ≤ d ℓ M + d · 2 ℓ M ≈ 15 (core) Server sends: t ≈ 2 ℓ M < 1 SVM classification Client sends (core) t ∗ , ⟦ µ i ⟧ , 0 ≤ i ≤ ℓ − 1 2 ℓ M + ℓ · 2 ℓ M ≈ 29 Server sends ⟦ h ∗ i ⟧ , − 1 ≤ i ≤ ℓ − 1 ( ℓ + 1 ) · 2 ℓ M ≈ 30 FFNN sign act. Server sends 2,655 (core) t ∗ , ⦃ µ i ⦄ s , 0 ≤ i ≤ ℓ − 1 L · d · ( ℓ + 1 ) · 2 ℓ M (885 per layer) Client sends 2,700 ⟦ ˆ s , − 1 ≤ i ≤ ℓ − 1 L · d · ( ℓ + 2 ) · 2 ℓ M y ∗ ⟧ , ⦃ h ∗ i ⦄ (900 per layer) 1 Features: d = 30; key-size ℓ M = 2048; κ = 95; layers L = 3; Precision P = 53; Inner-product bound: ℓ = 58 c 2019 OneSpan Innovation Centre 12 Innovation Centre

RESULTS: LINEAR REGRESSION Private LR: 70 features Private LR: 7994 features Private linear regression (core protocol) Private linear regression (core protocol) Dataset: audiology, # features: 70 Dataset: enron, # features: 7994 16 Client Client Average computing time (ms) over 1000 trials Average computing time (ms) over 1000 trials Server Server 600 14 500 12 400 10 300 8 200 6 4 100 2 0 8 6 0 6 6 0 6 6 0 8 0 8 6 0 6 6 0 6 6 0 8 0 8 7 7 6 6 7 7 8 0 1 4 8 7 7 6 6 7 7 8 0 1 4 3 4 5 6 7 8 9 0 2 3 4 3 4 5 6 7 8 9 0 2 3 4 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 Length of modulus N (bits) Length of modulus N (bits) On Intel i7-4770, 3.4GHz c 2019 OneSpan Innovation Centre 13 Innovation Centre

RESULTS: SUPPORT VECTOR MACHINE CLASSIFICATION Private SVM: 70 features Private SVM: 7994 features Private SVM classification (core protocol) Private SVM classification (core protocol) Dataset: audiology, # features: 70 Dataset: enron, # features: 7994 1750 Client Client Average computing time (ms) over 100 trials Average computing time (ms) over 100 trials Server Server 1750 1500 1500 1250 1250 1000 1000 750 750 500 500 250 250 8 6 0 6 6 0 6 6 0 8 0 8 6 0 6 6 0 6 6 0 8 0 8 7 7 6 6 7 7 8 0 1 4 8 7 7 6 6 7 7 8 0 1 4 3 4 5 6 7 8 9 0 2 3 4 3 4 5 6 7 8 9 0 2 3 4 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 Length of modulus N (bits) Length of modulus N (bits) On Intel i7-4770, 3.4GHz DGK+ comparison is the main limiting factor c 2019 OneSpan Innovation Centre 14 Innovation Centre

RESULTS: NEURAL NETWORKS Private NNs: 10 features | 3 layers Private NNs: 10 features | 3 layers simple FFNN with sign activation (heuristic solution) simple FFNN with sign activation Dataset: random, # features: 10, # layers: 3 Dataset: random, # features: 10, # layers: 3 50000 Client Client Average computing time (ms) over 100 trials Average computing time (ms) over 100 trials Server Server 500 40000 400 30000 300 20000 200 10000 100 0 8 6 0 6 6 0 6 6 0 8 0 8 6 0 6 6 0 6 6 0 8 0 8 7 7 6 6 7 7 8 0 1 4 8 7 7 6 6 7 7 8 0 1 4 3 4 5 6 7 8 9 0 2 3 4 3 4 5 6 7 8 9 0 2 3 4 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 Length of modulus N (bits) Length of modulus N (bits) On Intel i7-4770, 3.4GHz DGK+ comparison is the main limiting factor c 2019 OneSpan Innovation Centre 15 Innovation Centre

COMMENTS/QUESTIONS? c 2019 OneSpan Innovation Centre 16 Innovation Centre

PINFER: PRIVACY-PRESERVING INFERENCE DPM 2019 Luxembourg Sep. 26, - PowerPoint PPT Presentation

Innovation Centre PINFER: PRIVACY-PRESERVING INFERENCE DPM 2019 Luxembourg Sep. 26, 2019 Marc Joye Fabien Petitcolas MACHINE LEARNING AS A SERVICE GENERIC MODEL result Client Cloud (Server) 1 exchange of messages c 2019 OneSpan

Privacy Preserving Protocols Workshop on Cryptography for the Internet of Things Jens Hermans KU

FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY

Privacy Preserving Privacy Preserving Netw ork Flow Netw ork Flow Recording Recording Bilal

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Privacy in Wireless Networks privacy notions and metrics; privacy in RFID systems; location

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Privacy preserving data mining randomized response and association rule hiding Li Xiong

Towards Privacy-Preserving Ontology Publishing F. Baader & A. Nuradiansyah Technische

New Directions in Privacy- preserving Machine Learning Kamalika Chaudhuri University of

Preserving the Privacy of Sensitive Relationships in Graph Data Motivation Valuable Data! No

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

Database Privacy Review: Anonymity vs. Privacy Privacy - Privacy is the claim of individuals,

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

Support Vector Machines Prof. Mike Hughes Many ideas/slides attributable to: Dan Sheldon

E9 205 Machine Learning for Signal Processing Linear Predictive Analysis 22-08-2016 Linear

STAT 113 Simple Linear Regression Colin Reimer Dawson Oberlin College Sept. 16, 2015 Outline

Linear Prediction Analysis of Speech Sounds Berlin Chen 2003 References: 1. X. Huang et. al.,

Register allocation Michel Schinz (based on Erik Stenmans slides) Advanced Compiler

Query Processing Query Processing Steps balance < 2500 ( balance ( account)) balance

Systems of Linear Equations Marco Chiarandini Department of Mathematics & Computer Science

Graphing Linear Systems MPM2D: Principles of Mathematics Previously, you have graphed linear

PINFER: PRIVACY-PRESERVING INFERENCE DPM 2019 Luxembourg Sep. 26, - PowerPoint PPT Presentation

Innovation Centre PINFER: PRIVACY-PRESERVING INFERENCE DPM 2019 Luxembourg Sep. 26, 2019 Marc Joye Fabien Petitcolas MACHINE LEARNING AS A SERVICE GENERIC MODEL result Client Cloud (Server) 1 exchange of messages c 2019 OneSpan

Privacy Preserving Protocols Workshop on Cryptography for the Internet of Things Jens Hermans KU

FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY

Privacy Preserving Privacy Preserving Netw ork Flow Netw ork Flow Recording Recording Bilal

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Privacy in Wireless Networks privacy notions and metrics; privacy in RFID systems; location

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Privacy preserving data mining randomized response and association rule hiding Li Xiong

Towards Privacy-Preserving Ontology Publishing F. Baader &amp; A. Nuradiansyah Technische

New Directions in Privacy- preserving Machine Learning Kamalika Chaudhuri University of

Preserving the Privacy of Sensitive Relationships in Graph Data Motivation Valuable Data! No

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

Database Privacy Review: Anonymity vs. Privacy Privacy - Privacy is the claim of individuals,

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

Support Vector Machines Prof. Mike Hughes Many ideas/slides attributable to: Dan Sheldon

E9 205 Machine Learning for Signal Processing Linear Predictive Analysis 22-08-2016 Linear

STAT 113 Simple Linear Regression Colin Reimer Dawson Oberlin College Sept. 16, 2015 Outline

Linear Prediction Analysis of Speech Sounds Berlin Chen 2003 References: 1. X. Huang et. al.,

Register allocation Michel Schinz (based on Erik Stenmans slides) Advanced Compiler

Query Processing Query Processing Steps balance &lt; 2500 ( balance ( account)) balance

Systems of Linear Equations Marco Chiarandini Department of Mathematics &amp; Computer Science

Graphing Linear Systems MPM2D: Principles of Mathematics Previously, you have graphed linear

Towards Privacy-Preserving Ontology Publishing F. Baader & A. Nuradiansyah Technische

Query Processing Query Processing Steps balance < 2500 ( balance ( account)) balance

Systems of Linear Equations Marco Chiarandini Department of Mathematics & Computer Science