parsecureml an efficient parallel secure machine learning
play

ParSecureML: An Efficient Parallel Secure Machine Learning - PowerPoint PPT Presentation

49th International Conference on Parallel Processing - ICPP ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs Zheng Chen , Feng Zhang , Amelie Chi Zhou , Jidong Zhai+, Chenyang Zhang , Xiaoyong Du Renmin


  1. 49th International Conference on Parallel Processing - ICPP ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs Zheng Chen , Feng Zhang , Amelie Chi Zhou ★ , Jidong Zhai+, Chenyang Zhang , Xiaoyong Du ⋄ Renmin University of China ⋆ ShenZhen Uuiversity +Tsinghua University 1

  2. Outline 1. Background 2. Motivation 3. Basic Idea 4. Challenges 5. ParSecureML 6. Evaluation 7. Source Code at Github 8. Conclusion 2/33

  3. 1. Background • Secure Machine Learning … … (b) Machine learning process with two-party (a) Typical machine learning process. computation. 3/33

  4. 1. Background • Secure Machine Learning … … 1 (b) Machine learning process with two-party (a) Typical machine learning process. computation. 4/33

  5. 1. Background • Secure Machine Learning … … 2 1 (b) Machine learning process with two-party (a) Typical machine learning process. computation. 5/33

  6. 1. Background • Secure Machine Learning 3 … … 2 1 (b) Machine learning process with two-party (a) Typical machine learning process. computation. 6/33

  7. 1. Background • Secure Machine Learning 3 … … 2 4 1 (b) Machine learning process with two-party (a) Typical machine learning process. computation. 7/33

  8. 1. Background • GPU Acceleration https://developer.nvidia.com/deep-learning 8/33

  9. 2. Motivation • Performance Degradation origin SecureML 3 Normalized performance 2.5 2 1.5 1 0.5 0 Linear Logistic MLP Convolution regression regression neural network 9/33

  10. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 10/33

  11. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 11/33

  12. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 12/33

  13. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 13/33

  14. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 14/33

  15. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 15/33

  16. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 16/33

  17. 3. Basic Idea • A GPU-based two-party computation that considers both the GPU characteristics and features of two-party computation shall have better performance acceleration effects. • Challenges • Challenge 1: Complex triplet multiplication based computation patterns • Challenge 2: Frequent intra-node data transmission between CPU and GPU • Challenge 3: Complicated inter-node data dependence 17/33

  18. 4. Challenges • Challenge 1: Complex triplet multiplication based computation patterns … … 18/33

  19. 4. Challenges • Challenge 2: Frequent intra-node data transmission between CPU and GPU server1 server2 data data data data Step n Step n CPU GPU CPU GPU Step Step data data data data n+1 n+1 … … 19/33

  20. 4. Challenges • Challenge 3: Complicated inter-node data dependence server1 server2 data data data data Step n Step n CPU GPU CPU GPU Step Step data data data data n+1 n+1 … … 20/33

  21. 5. ParSecureML • Overview - ParSecureML consists of three major components: • Profiling-guided adaptive GPU utilization • Intra-node double pipeline • Inter-node compressed transmission communication • offline online … client pipeline execution among different layers server1 encrypted layer2 data reconstruct GPU operation reconstruct GPU operation … forward forward backward backward layer 1 GPU-based compute data input compressed communication final result data distribution … server2 reconstruct GPU operation reconstruct GPU operation encrypted layer2 … forward forward backward backward data layer 1 pipeline execution among different layers 21/33

  22. 5. ParSecureML • Profiling-Guided Adaptive GPU Utilization Online acceleration design Offline acceleration design 𝐷𝑄𝑉: 𝐹 𝑗 = 𝐹 0 + 𝐹 1 𝐺 = 𝐺 0 + 𝐺 1 𝐻𝑄𝑉: 𝐷𝑗 = (−𝑗) × 𝐹 × 𝐺 + 𝐵𝑗 × 𝐺 + 𝐹 × 𝐶𝑗 + 𝑎𝑗 22/33

  23. 5. ParSecureML • Double Pipeline for Intra-Node CPU-GPU Fine-Grained Cooperation • Pipeline 1 to overlap PCIe data transmission and GPU computation. GPU computation: (-i) E+Ai D E data transmission: E Ai F Bi time 23/33

  24. 5. ParSecureML • Double Pipeline for Intra-Node CPU-GPU Fine-Grained Cooperation • Pipeline 1 to overlap PCIe data transmission and GPU computation. GPU computation: (-i) E+Ai D E data transmission: E Ai F Bi time • Pipeline 2 to overlap operations reconstruct operation reconstruct operation layer n (forward) (forward) (backward) (backward) … … reconstruct operation reconstruct operation layer 2 (forward) (forward) (backward) (backward) reconstruct operation reconstruct operation layer 1 (backward) (forward) (forward) (backward) time

  25. 5. ParSecureML • Compressed Transmission for Inter-Node Communication server1 server2 𝑗 𝑗 E 0 E 1 𝐵 𝑗, 𝑘 + 1 = 𝑗, 𝑘 𝑗𝑘 𝐶 𝑗, 𝑘 + 1 = 𝑗, 𝑘 𝑗𝑘 F 0 F 1 Δ A Δ A server1 server2 CSR ΔA Δ A CSR ΔA Δ A no no yes Is Sparse? yes Is Sparse? CSR ΔB CSR ΔB Δ B Δ B no no Δ B Δ B 25/33

  26. 6. Evaluation • Baseline: SecureML[1] • Benchmarks • Convolution neural network (CNN) • Multilayer Perceptron (MLP). • Recurrent neural network (RNN) • Linear regression • Logistic regression • Datasets - VGGFace2/NIST/SYNTHETIC/MNIST • HPC Cluster • Intel(R) Xeon(R) CPU E5-2670 v3 • Nvidia Tesla V100 [1] Mohassel P, Zhang Y. Secureml: A system for scalable privacy preserving machine learning[C]//2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017: 19-38

  27. 6. Evaluation • Overall speedups. On average, ParSecureML achieves an average speedup of 32.2x over the SecureML. Overall performance 100 Speedup 10 1 VGGFace2 NIST SYNTHETIC MNIST 27/33 CNN MLP Linear Logistic RNN

  28. 6. Evaluation • Online speedups. The average online performance speedup is 61.4x (even higher than the overall speedup). Online performance 100 Speedup 10 1 VGGFace2 NIST SYNTHETIC MNIST 28/33 CNN MLP Linear Logistic RNN

  29. 6. Evaluation • Offline speedups. Applying GPUs in the offline phase brings 1.2x performance benefits. offline performance 3 2.5 2 speedup 1.5 1 0.5 0 VGGFace2 NIST SYNTHETIC MNIST CNN MLP Linear Logistic RNN 29/33

  30. 6. Evaluation • Communication benefits - On average, ParSecureML reduces 23.7% communication overhead. Communication benefits 60 50 Improvement(%) 40 30 20 10 0 VGGFace2 NIST SYNTHETIC MNIST 30/33 CNN MLP Linear Logistic RNN

  31. 6. Evaluation • Influence of workload size workload SecureML ParSecureML 2500 2000 time(s) 1500 1000 500 0 workload size(MB) 31/33

  32. 6. Source Code at Github • https://github.com/ZhengChenCS/ParSecureML 32/33

  33. 7. Conclusion • We exhibit our observations and insights in SecureML acceleration. • We develop ParSecureML, the first parallel secure machine learning framework on GPUs. • We demonstrate the benefits of ParSecureML over the state-of-the-art secure machine learning framework. 33/34

  34. Thank you! • Any questions? ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs Zheng Chen , Feng Zhang , Amelie Chi Zhou ★ , Jidong Zhai+, Chenyang Zhang , Xiaoyong Du ⋄ Renmin University of China ⋆ ShenZhen Uuiversity +Tsinghua University chenzheng123@ruc.edu.cn, fengzhang@ruc.edu.cn, chi.zhou@szu.edu.cn, zhaijidong@tsinghua.edu.cn, chenyangzhang@ruc.edu.cn, duyong@ruc.edu.cn https://github.com/ZhengChenCS/ParSecureML 34/34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend