Oblivious Neural Network Predictions via MiniONN Transformations - - PowerPoint PPT Presentation

β–Ά
oblivious neural network
SMART_READER_LITE
LIVE PREVIEW

Oblivious Neural Network Predictions via MiniONN Transformations - - PowerPoint PPT Presentation

Oblivious Neural Network Predictions via MiniONN Transformations Presented by: Sherif Abdelfattah Liu, J., Juuti, M., Lu, Y., & Asokan, N. (2017, October). Oblivious neural network predictions via minionn transformations. In Proceedings of


slide-1
SLIDE 1

Oblivious Neural Network Predictions via MiniONN Transformations

Presented by: Sherif Abdelfattah

Liu, J., Juuti, M., Lu, Y., & Asokan, N. (2017, October). Oblivious neural network predictions via minionn

  • transformations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications

Security (pp. 619-631). ACM. (121 citation)

1

slide-2
SLIDE 2

Machine Learning as a Service

Input Predictions

This way is a violation of clients’ privacy

2

slide-3
SLIDE 3

Running predictions on client-side

  • A naive solution is to have clients download the model and run the

prediction phase on client-side.

Model

  • It becomes more difficult for service providers to update their models.
  • For security applications (e.g., spam or malware detection services), an adversary can use

the model as an oracle to develop strategies for evading detection.

  • If the training data contains sensitive information (such as patient records from a

hospital) revealing the model may compromise privacy of the training data.

3

slide-4
SLIDE 4

Oblivious Neural Networks (O (ONN)

The solution is using make the neural network oblivious

  • The server learns nothing about the client’s input.
  • The clients learn nothing about the model.

4

slide-5
SLIDE 5

MiniONN

Blinded Input Blinded Predictions Oblivious Protocols

  • Low overhead almost 1 s
  • Work with all neural networks
  • MiniONN: Minimizing the Overhead for Oblivious Neural Network

6

slide-6
SLIDE 6

How it works?

π‘Œ = 𝑦1 𝑦2 , 𝑋 = π‘₯1,1 π‘₯1,2 π‘₯2,1 π‘₯2,2 , 𝑐 = 𝑐1 𝑐2 , 𝑋′ = π‘₯β€²1,1 π‘₯β€²1,2 π‘₯β€²2,1 π‘₯β€²2,2 , 𝑐′ = 𝑐′1 𝑐′2 𝑿′. 𝒀′ + 𝒄′ π’ˆ( 𝒛 ) 𝑿. 𝒀 + 𝒄

𝒀 𝒛 𝒀’ 𝒂 Represents Linear Transformation Represents Non-Linear Transformation (Activation Function)

𝒂 = 𝑿′. π’ˆ 𝑿. 𝒀 + 𝒄 + 𝒄′

7

slide-7
SLIDE 7

Core Id Idea

  • The core idea is to use secret sharing for oblivious computation.

𝑿′. + 𝒄′ π’ˆ( ) 𝑿. + 𝒄

𝒛𝒅 π’šβ€²π’… π’šπ π’šπ­ 𝒛𝒕 π’šβ€²π’• Client Server 𝒛′𝒅 𝒛′𝒕 𝒂 𝑦𝑑 + 𝑦𝑑 = π‘Œ 𝑧𝑑 + 𝑧𝑑 = 𝑧 𝑦′𝑑 + 𝑦′𝑑 = π‘Œβ€² 𝑧′𝑑 + 𝑧′𝑑 = 𝑧′ The client & the server shares 𝑦𝑑 and 𝑦𝑑

8

slide-8
SLIDE 8

Secret sharing input 𝒀

π’šπŸ

𝒅, π’šπŸ‘ 𝒅 𝒔𝒃𝒐𝒆𝒑𝒏 𝒂𝑢

π’šπŸ

𝒕 = π’šπŸ βˆ’ π’šπŸ 𝒅

π’šπŸ‘

𝒕 = π’šπŸ‘ βˆ’ π’šπŸ‘ 𝒅

π’šπŸ

𝒕, π’šπŸ‘ 𝒕 π’šπ’… is independent of π’š so it can be pre-chosen

9

slide-9
SLIDE 9

Oblivious linear transformation 𝑿. 𝒀 + 𝒄

π‘₯1,1 π‘₯1,2 π‘₯2,1 π‘₯2,2 β‹… 𝑦1 𝑦2 + 𝑐1 𝑐2 = π‘₯1,1 π‘₯1,2 π‘₯2,1 π‘₯2,2 β‹… 𝑦1

𝑑 + 𝑦1 𝑑

𝑦2

𝑑 +𝑦2 𝑑 + 𝑐1

𝑐2 = π‘₯1,1(𝑦1

𝑑 + 𝑦1 𝑑) + π‘₯1,2 𝑦2 𝑑 +𝑦2 𝑑 + 𝑐1

π‘₯2,1(𝑦1

𝑑 + 𝑦1 𝑑) + π‘₯2,2 𝑦2 𝑑 +𝑦2 𝑑 + 𝑐2

= π‘₯1,1𝑦1

𝑑 + π‘₯1,2𝑦2 𝑑 + 𝑐1 + π‘₯1,1𝑦1 𝑑 +π‘₯1,2𝑦2 𝑑

π‘₯2,1𝑦1

𝑑 + π‘₯2,2𝑦2 𝑑 + 𝑐2 + π‘₯2,1𝑦1 𝑑 + π‘₯2,2𝑦2 𝑑

Compute locally by the server Dot-product

10

slide-10
SLIDE 10

Oblivious linear transformation (d (dot-product)

𝑠

1,1, 𝑠 1,2, 𝑠2,1, 𝑠2,2 π‘ π‘π‘œπ‘’π‘π‘› π‘Žπ‘‚

𝐹 π‘₯1,1 , 𝐹 π‘₯1,2 , 𝐹 π‘₯2,1 , 𝐹 π‘₯2,2 Homomorphic Encryption with SIMD1

1Single instruction multiple data (SIMD): technique used to reduce the memory of the circuit and improve the evaluation time.

𝑑1,1 = 𝐹 π‘₯1,1𝑦1

𝑑 βˆ’ 𝑠 1,1

𝑑1,2 = 𝐹 π‘₯1,2𝑦2

𝑑 βˆ’ 𝑠 1,2

𝑑2,1 = 𝐹 π‘₯2,1𝑦1

𝑑 βˆ’ 𝑠2,1

𝑑2,2 = 𝐹 π‘₯2,2𝑦2

𝑑 βˆ’ 𝑠2,2

𝑑1,1, 𝑑1,2, 𝑑2,1, 𝑑2,2

𝐸(𝑑1,1), 𝐸(𝑑1,2), 𝐸(𝑑2,1), 𝐸(𝑑2,2) 𝑣1 = 𝐸(𝑑1,1) + 𝐸(𝑑1,2) = π‘₯1,1𝑦1

𝑑 + π‘₯1,2𝑦2 𝑑 βˆ’ (𝑠 1,1+𝑠 1,2)

𝑣2 = 𝐸(𝑑2,1) + 𝐸(𝑑2,2) = π‘₯2,1𝑦1

𝑑 + π‘₯2,2𝑦2 𝑑 βˆ’ (𝑠2,1+𝑠2,2)

𝑀1 = 𝑠

1,1 + 𝑠 1,2

𝑀2 = 𝑠2,1 + 𝑠2,2

11

slide-11
SLIDE 11

Oblivious linear transformation 𝑿. 𝒀 + 𝒄

= π‘₯1,1𝑦1

𝑑 + π‘₯1,2𝑦2 𝑑 + 𝑐1 + π‘₯1,1𝑦1 𝑑 +π‘₯1,2𝑦2 𝑑

π‘₯2,1𝑦1

𝑑 + π‘₯2,2𝑦2 𝑑 + 𝑐2 + π‘₯2,1𝑦1 𝑑 + π‘₯2,2𝑦2 𝑑

= π‘₯1,1𝑦1

𝑑 + π‘₯1,2𝑦2 𝑑 + 𝑐1 + 𝑣1

π‘₯2,1𝑦1

𝑑 + π‘₯2,2𝑦2 𝑑 + 𝑐2 + 𝑣2

+ 𝑀1 𝑀2 = 𝑧1

𝑑

𝑧2

𝑑

+ 𝑧1

𝑑

𝑧2

𝑑

12

slide-12
SLIDE 12

Oblivious Activation Functions π’ˆ(𝒛)

Piecewise linear functions

  • For example (ReLU: 𝑦 = compare(𝑧, 0))
  • Oblivious ReLU 𝑦𝑑+𝑦𝑑= compare 𝑧𝑑 + 𝑧𝑑, 0
  • Computed obliviously by a garbled circuit2

2garbled circuit: is a two-party computation (2PC) technique that allow two parties to jointly compute a function without learning each other’s input. 13

slide-13
SLIDE 13

Oblivious Activation Functions π’ˆ(𝒛)

Smooth functions

  • For example (Sigmoid: 𝑦 =

Ξ€ 1 1 + π‘“βˆ’π‘§ )

  • Oblivious Sigmoid 𝑦𝑑+𝑦𝑑=

Ξ€ 1 1 + π‘“βˆ’(𝑧𝑑+𝑧𝑑)

  • Approximate by a piecewise linear function
  • Computed obliviously by a garbled circuit

14

slide-14
SLIDE 14

The fi final result

𝑧1

𝑑, 𝑧2 𝑑

𝑧1 = 𝑧1

𝑑 + 𝑧1 𝑑

𝑧2 = 𝑧2

𝑑 + 𝑧2 𝑑

15

slide-15
SLIDE 15

Performance

  • 1. MNIST (60 000 training images and 10 000 test images)
  • Handwriting recognition
  • CNN model
  • ReLU activation function
  • 2. CIFAR-10 (50 000 training images and 10 000 test images)
  • Image classification
  • CNN model
  • ReLU activation function
  • 3. Penn Treebank (PTB) (929 000 training words, 73 000 validation words, and 82 000 test words.)
  • language modeling: predicting next words given the previous words
  • Long Short Term Memory (LSTM): commonly used for language modeling
  • Sigmoidal activation function

16

slide-16
SLIDE 16

Performance

MNIST/Square/CNN Latency (s) Msg sizes (MB) Accuracy %

  • ffline
  • nline
  • ffline
  • nline

CryptoNets 297.5 372.2 98.95 MiniONN 0.88 0.4 3.6 44 98.95

  • Comparison between MiniONN vs. CryptoNets

17

slide-17
SLIDE 17

Performance

Model Latency (s) Msg sizes (MB) Accuracy %

  • ffline
  • nline
  • ffline
  • nline

MNIST/ReLU/CNN 3.58 5.74 20.9 20.9 99.0 CIFAR-10/ReLU/CNN 472 72 3046 6226 81.61 PTB/Sigmoidal/LSTM 13.9 4.39 86.7 474 cross-entropy loss:4.79

  • For single query

18

slide-18
SLIDE 18

Thank You

19