PrivPy: Scalable and General Privacy-Preserving Data Mining
Yi Li∗, Yitao Duan†, Yu Yu§, Shouyao Zhao§, Wei Xu∗
∗ Institute for Interdisciplinary Information Sciences, Tsinghua University † NetEase Youdao §Shanghai Jiaotong University
PrivPy: Scalable and General Privacy-Preserving Data Mining Yi Li , - - PowerPoint PPT Presentation
PrivPy: Scalable and General Privacy-Preserving Data Mining Yi Li , Yitao Duan , Yu Yu , Shouyao Zhao , Wei Xu Institute for Interdisciplinary Information Sciences, Tsinghua University NetEase Youdao Shanghai Jiaotong
∗ Institute for Interdisciplinary Information Sciences, Tsinghua University † NetEase Youdao §Shanghai Jiaotong University
2 Privacy Compliance Data asset
Compute servers see nothing Get nothing other than the final results P r i v a t e i n p u t s
d a t a
n e r s 3
Private data Private model Inference result
4
5 F(x1, x2, …. xn) y x1 x2 xn
6
The Cryptography World The Data Science World
7
Computation Engines Interpreter Optimizer Convenient APIs Language Front-end Back-end
8
𝜒(𝑣) = (𝑣1, 𝑣2) 𝑣1: uniformly distributed in 𝜚𝑞 𝑣2: = 𝑣 - 𝑣1 (mod 𝑞)
9
,
,
,
,
,
,
,
,
10
Fixed-length 𝑚 − 𝑙 Integer part Fixed-length 𝑙 decimal part
Ø PICCO, Sharemind, SPDZ, etc
Ø SecureML, ABY3, PrivPy
11
SS Store 1 PO Engine SS Store 2 PO Engine PO Engine PO Engine
SS Store a SS Store b TASK CONFIG Python code Data source addr Result addr
12
SS Store 1 PO Engine SS Store 2 PO Engine PO Engine PO Engine
SS Store a SS Store b
𝑦1 𝑧2 𝑦2 𝑧1 𝑨1 𝑨2 𝑦1 𝑦2
Private Ops Protocols
res1 res2
res1 + res2 = res 13
14
15
"
"
"
……
"
……
16
Mul Cmp Add
Garbled circuit
u Division: Newton-Raphson method u Sigmoid: Euler Method u ReLu: comparison u Other functions: e𝘺, log(x), …
𝑧 𝑦 = 1 1 + 𝑓FG 𝑧′ 𝑦 = 𝑧(𝑦)(1 − 𝑧(𝑦)) 𝑧 𝑦IJ* = 𝑧 𝑦I + 𝑧,(𝑦I)Δ𝑦 = 𝑧 𝑦I + 𝑧 𝑦I 1 − 𝑧 𝑦I Δ𝑦 17
18
19
model
Inference result
image
PrivPy Engine
20
21
Engine Approach LAN (10Gbps) decimal multiplication comparison PrivPy SS 10,473,532 1,282,027 Helib FHE 258
GC 3,930 78,431 P4P+HE SS+HE 4,344
SS with active security 83,073 20,472 SPDZ+PrivPy SS with active security 83,229 20,320
22
Dataset: MNIST with 70,000 labeled handwritten digits Algorithm:
Batch size LAN (10Gbps) WAN (50Mbps) LR training MF training CNN inference LR training MF training CNN inference Single op 5.3e-3 7.1e-3 9.6e-2 2.61 0.37 7.64 Batch (1000
3.92 5.67 12.02 7.3 13.2 56.3
u MPC can be useful in data mining, but big gap to bridge u PrivPy is an early attempt to make MPC practical for large datasets
u PrivPy is an on-going effort
23