ParSecureML: An Efficient Parallel Secure Machine Learning - - PowerPoint PPT Presentation

parsecureml an efficient parallel secure machine learning
SMART_READER_LITE
LIVE PREVIEW

ParSecureML: An Efficient Parallel Secure Machine Learning - - PowerPoint PPT Presentation

49th International Conference on Parallel Processing - ICPP ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs Zheng Chen , Feng Zhang , Amelie Chi Zhou , Jidong Zhai+, Chenyang Zhang , Xiaoyong Du Renmin


slide-1
SLIDE 1

ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs

1

Zheng Chen , Feng Zhang , Amelie Chi Zhou★, Jidong Zhai+, Chenyang Zhang , Xiaoyong Du

⋄Renmin University of China ⋆ShenZhen Uuiversity +Tsinghua University

49th International Conference on Parallel Processing - ICPP

slide-2
SLIDE 2

Outline

  • 1. Background
  • 2. Motivation
  • 3. Basic Idea
  • 4. Challenges
  • 5. ParSecureML
  • 6. Evaluation
  • 7. Source Code at Github
  • 8. Conclusion

2/33

slide-3
SLIDE 3
  • 1. Background
  • Secure Machine Learning

3/33

… …

(a) Typical machine learning process. (b) Machine learning process with two-party computation.

slide-4
SLIDE 4
  • 1. Background
  • Secure Machine Learning

4/33

… …

(a) Typical machine learning process. (b) Machine learning process with two-party computation. 1

slide-5
SLIDE 5
  • 1. Background
  • Secure Machine Learning

5/33

… …

(a) Typical machine learning process. (b) Machine learning process with two-party computation. 1 2

slide-6
SLIDE 6
  • 1. Background
  • Secure Machine Learning

6/33

… …

(a) Typical machine learning process. (b) Machine learning process with two-party computation. 1 2 3

slide-7
SLIDE 7
  • 1. Background
  • Secure Machine Learning

7/33

… …

(a) Typical machine learning process. (b) Machine learning process with two-party computation. 1 2 3 4

slide-8
SLIDE 8
  • 1. Background
  • GPU Acceleration

8/33

https://developer.nvidia.com/deep-learning

slide-9
SLIDE 9
  • 2. Motivation
  • Performance Degradation

9/33

0.5 1 1.5 2 2.5 3

Linear regression Logistic regression MLP Convolution neural network Normalized performance

  • rigin

SecureML

slide-10
SLIDE 10
  • 2. Motivation
  • Time Breakdown for two-party computation

10/33

compute1 … communicate compute2 server1 client 62.68s 95.52s 0.24s compute final result 0.05s 0.11s compute1 … communicate compute2 server2 0.19s 0.21s

  • ffline
  • nline

encrypted data encrypted data input data data distribution

slide-11
SLIDE 11
  • 2. Motivation
  • Time Breakdown for two-party computation

11/33

compute1 … communicate compute2 server1 client 62.68s 95.52s 0.24s compute final result 0.05s 0.11s compute1 … communicate compute2 server2 0.19s 0.21s

  • ffline
  • nline

encrypted data encrypted data input data data distribution

slide-12
SLIDE 12
  • 2. Motivation
  • Time Breakdown for two-party computation

12/33

compute1 … communicate compute2 server1 client 62.68s 95.52s 0.24s compute final result 0.05s 0.11s compute1 … communicate compute2 server2 0.19s 0.21s

  • ffline
  • nline

encrypted data encrypted data input data data distribution

slide-13
SLIDE 13
  • 2. Motivation
  • Time Breakdown for two-party computation

13/33

compute1 … communicate compute2 server1 client 62.68s 95.52s 0.24s compute final result 0.05s 0.11s compute1 … communicate compute2 server2 0.19s 0.21s

  • ffline
  • nline

encrypted data encrypted data input data data distribution

slide-14
SLIDE 14
  • 2. Motivation
  • Time Breakdown for two-party computation

14/33

compute1 … communicate compute2 server1 client 62.68s 95.52s 0.24s compute final result 0.05s 0.11s compute1 … communicate compute2 server2 0.19s 0.21s

  • ffline
  • nline

encrypted data encrypted data input data data distribution

slide-15
SLIDE 15
  • 2. Motivation
  • Time Breakdown for two-party computation

15/33

compute1 … communicate compute2 server1 client 62.68s 95.52s 0.24s compute final result 0.05s 0.11s compute1 … communicate compute2 server2 0.19s 0.21s

  • ffline
  • nline

encrypted data encrypted data input data data distribution

slide-16
SLIDE 16
  • 2. Motivation
  • Time Breakdown for two-party computation

16/33

compute1 … communicate compute2 server1 client 62.68s 95.52s 0.24s compute final result 0.05s 0.11s compute1 … communicate compute2 server2 0.19s 0.21s

  • ffline
  • nline

encrypted data encrypted data input data data distribution

slide-17
SLIDE 17
  • 3. Basic Idea
  • A GPU-based two-party computation that considers both

the GPU characteristics and features of two-party computation shall have better performance acceleration effects.

  • Challenges
  • Challenge 1: Complex triplet multiplication based computation

patterns

  • Challenge 2: Frequent intra-node data transmission between CPU

and GPU

  • Challenge 3: Complicated inter-node data dependence

17/33

slide-18
SLIDE 18
  • 4. Challenges
  • Challenge 1: Complex triplet multiplication based

computation patterns

18/33

… …

slide-19
SLIDE 19
  • 4. Challenges
  • Challenge 2: Frequent intra-node data transmission

between CPU and GPU

19/33

server1 GPU CPU

Step n Step n+1

data data data server2 GPU CPU

Step n Step n+1

data data data data data

… …

slide-20
SLIDE 20
  • 4. Challenges
  • Challenge 3: Complicated inter-node data dependence

20/33

server1 GPU CPU

Step n Step n+1

data data data server2 GPU CPU

Step n Step n+1

data data data data data

… …

slide-21
SLIDE 21
  • 5. ParSecureML
  • Overview - ParSecureML consists of three major

components:

  • Profiling-guided adaptive GPU utilization
  • Intra-node double pipeline
  • Inter-node compressed transmission communication
  • 21/33

client compute final result

  • ffline
  • nline

encrypted data encrypted data input data GPU-based data distribution compressed communication … server1 reconstruct forward GPU operation forward reconstruct backward layer2 layer 1 … pipeline execution among different layers GPU operation backward … server2 reconstruct forward GPU operation forward reconstruct backward layer2 layer 1 … pipeline execution among different layers GPU operation backward

slide-22
SLIDE 22
  • 5. ParSecureML
  • Profiling-Guided Adaptive GPU Utilization

22/33

𝐷𝑄𝑉: 𝐹𝑗 = 𝐹0 + 𝐹1 𝐺 = 𝐺0 + 𝐺1 𝐻𝑄𝑉: 𝐷𝑗 = (−𝑗) × 𝐹 × 𝐺 + 𝐵𝑗 × 𝐺 + 𝐹 × 𝐶𝑗 + 𝑎𝑗

Offline acceleration design Online acceleration design

slide-23
SLIDE 23
  • 5. ParSecureML
  • Double Pipeline for Intra-Node CPU-GPU Fine-Grained

Cooperation

  • Pipeline 1 to overlap PCIe data transmission and GPU

computation.

23/33

Ai Bi E (-i) E+Ai D E F time data transmission: GPU computation:

slide-24
SLIDE 24
  • 5. ParSecureML
  • Double Pipeline for Intra-Node CPU-GPU Fine-Grained

Cooperation

  • Pipeline 1 to overlap PCIe data transmission and GPU

computation.

  • Pipeline 2 to overlap operations

Ai Bi E (-i) E+Ai D E F time data transmission: GPU computation:

reconstruct (forward)

  • peration

(forward) reconstruct (backward) reconstruct (forward)

  • peration

(forward) reconstruct (backward) reconstruct (forward)

  • peration

(forward) reconstruct (backward)

  • peration

(backward)

  • peration

(backward)

  • peration

(backward) layer 1 layer 2 layer n time … …

slide-25
SLIDE 25
  • 5. ParSecureML
  • Compressed Transmission for Inter-Node

Communication

25/33

E0 F0 F1

server1 server2

𝑗 𝑗 𝑗, 𝑘 + 1 = 𝑗, 𝑘 𝐵 𝑗𝑘 𝑗, 𝑘 + 1 = 𝑗, 𝑘 𝐶 𝑗𝑘

ΔA ΔB

server1

Is Sparse?

ΔA

no

ΔB

no

CSRΔA CSRΔB ΔA ΔB

server2

Is Sparse?

ΔA

no

ΔB

no

CSRΔA CSRΔB

yes yes

E1

slide-26
SLIDE 26
  • 6. Evaluation
  • Baseline: SecureML[1]
  • Benchmarks
  • Convolution neural network (CNN)
  • Multilayer Perceptron (MLP).
  • Recurrent neural network (RNN)
  • Linear regression
  • Logistic regression
  • Datasets - VGGFace2/NIST/SYNTHETIC/MNIST
  • HPC Cluster
  • Intel(R) Xeon(R) CPU E5-2670 v3
  • Nvidia Tesla V100

[1] Mohassel P, Zhang Y. Secureml: A system for scalable privacy preserving machine learning[C]//2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017: 19-38

slide-27
SLIDE 27
  • 6. Evaluation
  • Overall speedups. On average, ParSecureML achieves an

average speedup of 32.2x over the SecureML.

27/33

1 10 100 VGGFace2 NIST SYNTHETIC MNIST Speedup

Overall performance

CNN MLP Linear Logistic RNN

slide-28
SLIDE 28
  • 6. Evaluation
  • Online speedups. The average online performance speedup

is 61.4x (even higher than the overall speedup).

28/33

1 10 100 VGGFace2 NIST SYNTHETIC MNIST Speedup

Online performance

CNN MLP Linear Logistic RNN

slide-29
SLIDE 29
  • 6. Evaluation
  • Offline speedups. Applying GPUs in the offline phase brings

1.2x performance benefits.

29/33

0.5 1 1.5 2 2.5 3

VGGFace2 NIST SYNTHETIC MNIST speedup

  • ffline performance

CNN MLP Linear Logistic RNN

slide-30
SLIDE 30
  • 6. Evaluation
  • Communication benefits - On average, ParSecureML

reduces 23.7% communication overhead.

30/33

10 20 30 40 50 60 VGGFace2 NIST SYNTHETIC MNIST Improvement(%)

Communication benefits

CNN MLP Linear Logistic RNN

slide-31
SLIDE 31
  • 6. Evaluation
  • Influence of workload size

31/33

500 1000 1500 2000 2500 time(s) workload size(MB)

workload

SecureML ParSecureML

slide-32
SLIDE 32
  • 6. Source Code at Github
  • https://github.com/ZhengChenCS/ParSecureML

32/33

slide-33
SLIDE 33
  • 7. Conclusion
  • We exhibit our observations and insights in SecureML

acceleration.

  • We develop ParSecureML, the first parallel secure machine

learning framework on GPUs.

  • We demonstrate the benefits of ParSecureML over the

state-of-the-art secure machine learning framework.

33/34

slide-34
SLIDE 34

Thank you!

  • Any questions?

34/34

ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs

Zheng Chen , Feng Zhang , Amelie Chi Zhou★, Jidong Zhai+, Chenyang Zhang , Xiaoyong Du

⋄Renmin University of China ⋆ShenZhen Uuiversity +Tsinghua University

chenzheng123@ruc.edu.cn, fengzhang@ruc.edu.cn, chi.zhou@szu.edu.cn, zhaijidong@tsinghua.edu.cn, chenyangzhang@ruc.edu.cn, duyong@ruc.edu.cn

https://github.com/ZhengChenCS/ParSecureML