Machine Learning & Data confidentiality Melek nen Joint work - - PowerPoint PPT Presentation

β–Ά
machine learning data confidentiality
SMART_READER_LITE
LIVE PREVIEW

Machine Learning & Data confidentiality Melek nen Joint work - - PowerPoint PPT Presentation

Machine Learning & Data confidentiality Melek nen Joint work with Alberto Ibarrondo, Beyza Bozdemir, Mohamad Mansouri, Gamze Tillem, Orhan Ermis Machine Learning as a Service Client Server Performance No need for ML knowledge Cost


slide-1
SLIDE 1

Melek Γ–nen Joint work with Alberto Ibarrondo, Beyza Bozdemir, Mohamad Mansouri, Gamze Tillem, Orhan Ermis

Machine Learning & Data confidentiality

slide-2
SLIDE 2

Machine Learning as a Service

2

Client Server

Performance No need for ML knowledge Cost reduction

slide-3
SLIDE 3

Sensitive and confidential data

Client Server

  • Sensitive personal data
  • Corporate data (IP)
  • Intellectual property
  • Legal restrictions

3

slide-4
SLIDE 4

Data breaches in 2019

4

Average Cost

Global: 3.92M$, Per record: 150$

Top 3 sectors

Health, Financial, Services

Factors increasing cost

Extensive migration to cloud Third party involvement Compliance failures

Factors decreasing cost

Extensive use of encryption Use of security analytics

slide-5
SLIDE 5

GDPR Effect

5

GDPR

Effective in May 2018 Fines ~20 million euros or 4% of turnover

GDPR Fines in Year One

Global: 56M€ Leading watchdog: CNIL (France)

slide-6
SLIDE 6

PAPAYA

6

Single Source setting (Arrhythmia Detection, Mobility Analytics) Multiple Sources setting (Stress Management, Mobile Usage Analytics, Threat Detection) Third Party Querier (Mobile Usage Analytics) Data analytics: basic statistics, clustering, NN classification, NN training

?

slide-7
SLIDE 7

Homomorphic encryption

7

𝑭𝒐𝒅𝒔𝒛𝒒𝒖 π’πŸ π’‘π’’πŸ 𝑭𝒐𝒅𝒔𝒛𝒒𝒖 π’πŸ‘ = 𝑭𝒐𝒅𝒔𝒛𝒒𝒖 π’πŸπ’‘π’’πŸ‘ π’πŸ‘

Partially HE

Support one operation only

Somewhat HE

Support arbitrary + and limited number of x

Fully HE

Support any function

slide-8
SLIDE 8

Secure Two-party computation

8

Compute f(x,y) leak no other information than what Ideal model leaks

x y

Yao’s GC Arithmetic sharing Boolean sharing

slide-9
SLIDE 9

HE vs. 2PC

9

HE Non-interactive Only linear operations Expensive in computation cost No communication cost 2PC Interactive - Client is involved Linear and nonlinear operations Efficient in computation cost Expensive in communication cost

slide-10
SLIDE 10

Artificial Neural Networks

10

Supervised machine learning technique Two phases:

Training Classification

NN layers:

Activation layer Pooling layer Fully-connected layer Convolution layer (optional)

slide-11
SLIDE 11

Neural Networks - Architecture

11

slide-12
SLIDE 12

Privacy preserving NN Classification

Use Advanced cryptographic techniques

Homomorphic encryption, Secure 2PC

Challenge : Privacy vs. Performance

Additional overhead (Computation, memory & bandwidth) Complex operations (sigmoid, tanh, etc.) Real numbers (vs. integers with PETs)

Goal

Reduce NN complexity Approximate complex operations οƒž Use low degree polynomials Approximate real numbers οƒž Use integers

12

slide-13
SLIDE 13

Privacy preserving NN Classification

Use Advanced cryptographic techniques

Homomorphic encryption, Secure 2PC

Challenge : Privacy vs. Performance

Additional overhead (Computation, memory & bandwidth) Complex operations (sigmoid, tanh, etc.) Real numbers (vs. integers with PETs)

Goal

Reduce NN complexity Approximate complex operations οƒž Use low degree polynomials Approximate real numbers οƒž Use integers

13

slide-14
SLIDE 14

Approximation of NN layers

  • Convolution layer
  • Matrix multiplications οƒž No need for approximation
  • Activation layer
  • Most common approach: π’šπŸ‘ and ReLU
  • Pooling layer
  • Sum or average
  • Fully Connected layer
  • Matrix multiplications οƒž No need for approximation
  • Real numbers
  • Most common approach: Multiplying with 10π‘œ

14

slide-15
SLIDE 15

Privacy preserving NN classification

15

MPC-based solutions FHE-based solutions Hybrid solutions

DeepSecure Gazelle SecureML MiniONN Chameleon EzPC ABY3 SecureNN PAC … CryptoNets Chabanne et al. Bourse et al. Ibarrondo et al. CryptoDL … Swann …

slide-16
SLIDE 16

FHE-Based Batch Normalization

16

[DPM 2018]

LHE-based pp NN Investigate Batch Normalization BN Transformation

Simplify operations (with equivalence) Absorb BN in previous FC or Conv layers

Trained mean Trained variance Rescaling Shifting Small constant

slide-17
SLIDE 17

PAC: Pp Arrhythmia Classification

17

NN based ECG analysis 2PC based NN classifier

Low degree polynomials for activation functions Approximation of real numbers PCA for size reduction

Performance results with PhysioBank

96.34% accuracy 1 sec prediction time in real environment

PAC in batches

Efficient solution for real scenarios

[FPS 2019]

slide-18
SLIDE 18

SwaNN: Pp Classification based on PHE+2PC

18

Switches between PHE and 2PC

Paillier for linear operations Interactive Paillier for π’šπŸ‘

Two settings

[PUT 2019]

slide-19
SLIDE 19

Open questions

19

  • Multi-user privacy preserving NN classification
  • Multi-source, multi-querier, etc.
  • Privacy preserving NN training
  • Privacy preserving clustering
slide-20
SLIDE 20

melek.onen@eurecom.fr

Thank you!