Practical Secure Two-Party Computation and Applications
Thomas Schneider Estonian Winter School in Computer Science 2016
Practical Secure Two-Party Computation and Applications Thomas - - PowerPoint PPT Presentation
Practical Secure Two-Party Computation and Applications Thomas Schneider Estonian Winter School in Computer Science 2016 Overview Lecture 1: Introduction to Secure Two-Party Computation Lecture 2: Private Set Intersection Lecture 3: Tools
Thomas Schneider Estonian Winter School in Computer Science 2016
Overview Lecture 1: Introduction to Secure Two-Party Computation Lecture 2: Private Set Intersection Lecture 3: Tools and Applications Lecture 4: Hardware-assisted Cryptographic Protocols
2
The Engineering Cryptographic Protocols Group (ENCRYPTO)
3
Info: http://encrypto.de Thomas Schneider Daniel Demmler Ágnes Kiss Michael Zohner
Interested in Practical Secure Computation?
4
We have an open, fully funded position as Ph.D. Student / Research Assistant in Engineering Scalable Secure Computation http://encrypto.de/jobs
Darmstadt
TU Darmstadt
Lecture 1: Introduction Estonian Winter School in Computer Science 2016
The Web of Services
6
Our life moves into the web... ... and so does our data.
How were web services used yesterday?
7
“heart disease”
http://www.google.de attacker can eavesdrop
heart disease
How should web services be used today?
8
“heart disease”
secure channel protects communication against external attackers
heart disease
https://www.google.de HTTPS per default since 01/2010 02/2011 11/2012
Data breaches happen every day...
9
June 2, 2011: Google attacked from China
Computer hackers in China broke into the Gmail accounts of several hundred people, including senior US government officials, military personnel and political activists.
November 29, 2010: New WikiLeaks Publication
WikiLeaks releases US State Department communiqués that offer an extraordinary look at the inner workings, and sharp elbows of diplomacy.
... from outsiders ... or insiders ... or malware.
October 16, 2012: Espionage Malware MiniFlame
Kaspersky Labs discover that MiniFlame is most likely a targeted cyberweapon to conduct in-depth surveillance and cyber-espionage.
How could web services be used tomorrow?
10
heart disease encrypted query encrypted response
➪ Privacy-Preserving Web Services
httpp://www.google.de sensitive data remains encrypted
process under encryption
Privacy-Preserving Medical Diagnostics Services give health recommendations without direct access to patient’s data. Privacy-Preserving Cloud Computing Services allow to store and process data at untrusted service providers. Privacy-Preserving Face Recognition Services detect criminals without allowing to trace honest citizens.
Vision: Privacy-Preserving Web Services
process sensitive data without any data leakage, e.g.,
11
Is this possible at all?
12
Andrew Chi-Chi Yao 1986: Any efficiently computable function can be evaluated securely.
➪ Secure Computation
Secure Two-Party Computation
13
All Lectures: Semi-Honest (Passive) Adversaries
f(x,y) x y
Secure Two-Party Computation
14
Client C Server S private data x private data y z = f(x, y) public function f(·, ·)
Example: Yao’s Millionaires’ Problem
x = $2 Mio y = $1 Mio true
S2PC
Is C richer? x > y
DNA Searching [Troncoso-PastorizaKC07], ... Auctions [NaorPS99], ... Remote Diagnostics [BrickellPSW07], ... Biometric Identification [ErkinFGKLT09], ... Medical Diagnostics [BarniFKLSS09], ... Secure Two-Party Computation
15
Oblivious Transfer (OT)
16
1-out-of-2 OT is an essential building block for secure computation.
(x0, x1) xr r
How to Measure Efficiency of a Protocol?
17
faster
Overview of this lecture
18
Special Purpose Protocols Generic Protocols Arithmetic Circuit Boolean Circuit Homomorphic Encryption Symmetric Crypto Public Key Crypto GMW Yao OT One-Time Pad >> >>
Part 1: Yao vs. GMW Part 2: Efficient OT Extensions
Part 1: Yao vs. GMW and Efficient Circuits
19
GMW vs. Yao? Efficient secure two-party computation with low depth circuits. In FC’13.
Yao’s Garbled Circuits Protocol [Yao86]
20
Part 2: Efficient OT
Client C
Circuit
C f(·, ·) e C e y f(x, y) = e C(e x, e y) Server S
z . . .
yn
y1
Garbled Table
z . . . xn yn x1 y1 y2 x2
c1 c2
Garbled Values
e.g., x < y private data x = x1, .., xn private data y = y1, .., yn (e x; ⊥) ← OT(x; (e x0, e x1))
Setup Phase Online Phase
e c0
1, e
c1
1
E(e x0
1, e
y0
1; e
cg(0,0)
1
) E(e x0
1, e
y1
1; e
cg(0,1)
1
) E(e x1
1, e
y0
1; e
cg(1,0)
1
) E(e x1
1, e
y1
1; e
cg(1,1)
1
)
Garbled Circuits [Yao86]
21
Garbled circuit
01 01 01 01 01
Conventional circuit
(Slide from Viet-Tung Hoang)
given input keys, can compute output key only keys look random
22
Garbled Gate [Yao86]
given two input keys, can compute only output key
(Slide from Viet-Tung Hoang)
A C D X Y B X X X Y
1 2 3
Overview of Efficient Garbled Circuit Constructions
23
(Slide from Payman Mohassel)
1990 Point-and-Permute [BeaverMicaliRogaway] 1999 3-row reduction [NaorPinkasSumner] 2008 Free-XOR [KolesnikovSchneider] 2009 2-row reduction [PinkasSchneiderSmartWilliams] 2012 Garbling via AES [KreuterShelatShen] 2013 Fixed-key AES [BellareHoangKeelveedhiRogaway] 2014 FleXor [KolesnikovMohasselRosulek] 2015 HalfGates [ZahurRosulekEvans]
Summary of Garbled Circuit Constructions
24
(Slide from Mike Rosulek)
size (× t) garble cost (AES) eval cost (AES) XOR AND XOR AND XOR AND Classical large 8 5 P&P 4 4 1 GRR3 3 4 1 Free XOR 3 4 1 HalfGates 2 4 2 t: symmetric security parameter, e.g., t=128
Summary: Yao - the Apple
How to eat an apple? bite-by-bite
25
+ Yao has constant #rounds
symmetric crypto in the online phase
a = a1 ⊕ a2
The GMW Protocol [GMW87]
26
Secret share inputs: Non-Interactive XOR gates: c1 = a1 ⊕ b1 ; c2 = a2 ⊕ b2 Interactive AND gates: Recombine outputs:
∧ ⊕ a b d c
AND
c1, b1 c2, b2 d1 d2
b = b1 ⊕ b2 d = d1 ⊕ d2
AND
x1, y1 x2, y2 z1 z2
Evaluating ANDs via Multiplication Triples [Beaver91]
27
Setup phase: Generate multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2 for each AND via 2 OTs: 1) P1: m0, m1 ∈R {0,1}; P2: a2 ∈R {0,1} 2) P1 and P2 run OT, where P1 inputs (m0, m1), P2 inputs a2 and gets u2=ma2 3) P1 sets b1 = m0 ⊕ m1; v1 = m0 4) P1 and P2 repeat steps 1-3 with reversed roles to obtain (a1, u1); (b2, v2) 5) Pi sets ci = (ai bi) ⊕ ui ⊕ vi Online phase: P1 → P2: d1=x1⊕a1; e1=y1⊕b1 P1 ← P2: d2=x2⊕a2; e2=y2⊕b2 P1, P2: d=d1⊕d2; e=e1⊕e2 P1: z1=db1⊕ea1⊕c1⊕de P2: z2=db2⊕ea2⊕c2
Part 2: Efficient OTs
Summary: GMW - the Orange
How to eat an orange? 1) peel (almost all the effort)
28
2) eat (easy) Setup phase:
gate using 2 R-OTs and constant #rounds + no need to know function, only max. #ANDs Online phase: + evaluating circuit needs OTP operations only
Benchmarks of an optimized GMW implementation [SZ13]
29
Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.
Benchmarks of an optimized GMW implementation [SZ13]
30
Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.
Interactive AND gates via Beaver’s multiplication triples
[D. Beaver. Efficient multiparty protocols using circuit randomization. CRYPTO’91.]
setup phase: 1-out-of-4 OT
=> 1x network latency per layer of AND gates
Benchmarks of an optimized GMW implementation [SZ13]
31
Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.
Use AES-based PRF for OT extensions (instead of SHA-1).
Benchmarks of an optimized GMW implementation [SZ13]
32
Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.
Load Balancing:
=> Each party has exactly the same workload.
Benchmarks of an optimized GMW implementation [SZ13]
33
Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.
Use GMP instead of NTL for base OTs.
Benchmarks of an optimized GMW implementation [SZ13]
34
Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.
Process data in chunks of bytes (instead of bits).
Benchmarks of an optimized GMW implementation [SZ13]
35
Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.
Use assembly implementation of OpenSSL for SHA-1 (instead of C implementation of PolarSSL).
Benchmarks of an optimized GMW implementation [SZ13]
36
Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.
Single Instruction Multiple Data: Evaluate multiple circuits in parallel (here 32). (inspired by Sharemind)
Remaining Bottlenecks in LAN Setting
37
0.8% 1.4% 98% 16% 47% 32% 3% 3% 37% 35% 7% 20% 1% 0.1% (Base OTs)
Yao vs. GMW
38
Yao GMW symmetric crypto per AND S: 4, R: 2 (online) setup: S: 6, R: 6 communication [bit] per AND S→R: 2t setup: S→R:t || R→S:t
memory per wire [bit] t 1 rounds O(1) setup: O(1)
t: symmetric security parameter
Free XOR
Efficient Circuit Constructions for Secure Computation
Circuits for secure computation:
39
Classical circuit design:
TinyGarble: Highly compressed and scalable sequential garbled circuits. In IEEE S&P’15.
In ACM CCS’15.
Automatically generate optimized circuits from high-level descriptions:
Example Circuit: Addition
40
x` y` x1 y1 y2 x2 s`+1 s` s2 s1 . . .
c2 c3 ADD
Ripple-Carry-Adder si = xi ⊕ yi ⊕ ci ci+1 = ((xi ⊕ yi) ∧ (xi ⊕ ci)) ⊕ xi [BoyarPeraltaPochuev00] ANDsize = ℓ, ANDdepth = ℓ Ladner-Fischer-Adder [LF80] ANDsize = ℓ+1.25 ℓ log2(ℓ), ANDdepth = 1+2 log2(ℓ)
x
4 y 4
x
3 y 3
x
2 y 2
x
1 y 1
p
4,0
c
4,0
p
3,0
c
3,0
p
2,0
c
2,0
p
1,0
c
1,0
p
4,1
c
4,1
p
2,1
c
2,1
p
4,2
c
4,2
p
3,2
c
3,2
s
5
s
4
s
3
s
2
s
1
pi,0=xi⊕yi, ci,0=xi∧yi pi,j=pi,j-1∧pk,j-1 ci,j=(pi,j-1∧ck,j-1)∨ci,j-1
Example Circuits Summarized in [SchneiderZohner13]
41
Circuit Size S Depth D Addition Ripple-carry ADD/SUBℓ
RC
ℓ ℓ Ladner-Fischer ADDℓ
LF
1.25ℓ⌈log2 ℓ⌉ + ℓ 2⌈log2 ℓ⌉ + 1 LF subtraction SUBℓ
LF
1.25ℓ⌈log2 ℓ⌉ + 2ℓ 2⌈log2 ℓ⌉ + 2 Carry-save ADD(ℓ,3)
CSA
ℓ + S(ADDℓ) D(ADDℓ)+1 RC network ADD(ℓ,n)
RC
ℓn − ℓ + n − ⌈log2 n⌉ − 1 ⌈log2 n − 1⌉ + ℓ CSA network ADD(ℓ,n)
CSA
ℓn − 2ℓ + n − ⌈log2 n⌉ ⌈log2 n − 1⌉ +S(ADD
ℓ+⌈log2 n⌉ LF
) +D(ADD
ℓ+⌈log2 n⌉ LF
) Multiplication RCN school method MULℓ
RC
2ℓ2 − ℓ 2ℓ − 1 CSN school method MULℓ
CSN
2ℓ2 + 1.25ℓ⌈log2 ℓ⌉ − ℓ + 2 3⌈log2 ℓ⌉ + 4 RC squaring SQRℓ
RC
ℓ2 − ℓ 2ℓ − 3 LF squaring SQRℓ
LF
ℓ2 + 1.25ℓ⌈log2 ℓ⌉ − 1.5ℓ − 2 3⌈log2 ℓ⌉ + 3 Comparison Equality EQℓ ℓ − 1 ⌈log2 ℓ⌉ Sequential greater than GTℓ
S
ℓ ℓ D&C greater than GTℓ
DC
3ℓ − ⌈log2 ℓ⌉ − 2 ⌈log2 ℓ⌉ + 1 Selection Multiplexer MUXℓ ℓ 1
(ℓ,n) ℓ ℓ
Can trade-off larger size for better depth.
Part 2: Efficient OTs
42
More efficient oblivious transfer and extensions for faster secure computation. In ACM CCS’13. http://encrypto.de/code/OTExtension
Oblivious Transfer (OT)
43
1-out-of-2 OT is an essential building block for secure computation.
(x0, x1) xr r
OT - Bad News
44
A Public-Key Based OT Protocol: [NaorPinkas01]
45
input: x0, x1 input: b
Common input: G=<g> of prime order q t ∈R [0,q) C= gt C k ∈R [0,q) PKb = gk PK1-b = C/PKb PK0 PK1=C/PK0 r0, r1 ∈R [0,q) E0=<gr0, H((PK0)r0) ⊕ x0> E1=<gr1, H((PK1)r1) ⊕ x1> E0, E1 Eb=<L, R> h=H(Lk)=H((PKb)rb) xb=h⊕R
OT - Good News
46
use symmetric crypto to stretch few “real” OTs into longer/many OTs
[Beaver96] “real” OTs [IKNP03]
l-bit k-bit k OTs m OTs
OT Extension of [IKNP03] (1)
47
OT Extension of [IKNP03] (2)
48
(k: security parameter)
OT Extension of [IKNP03] (3)
49
PRG: pseudo-random generator (instantiated with AES)
OT Extension of [IKNP03] (4)
50
H: correlation robust function (instantiated with hash function)
10 % 42 % 14 % 33 % 1 %
"real" OTs H (SHA-1) PRG (AES) Transpose Misc (Snd/Rcv/XOR)
Computation Complexity of OT Extension
51
Time distribution for 10 Mio. OTs (in 21s):
Non-crypto part was bottleneck!!!
Per OT: # PRG evaluations # H evaluations 1 2 2 1
Algorithmic Optimization: Efficient Matrix Transposition
52
Algorithmic Optimization: Parallelization
53
parallelized by splitting the T matrix into sub-matrices
OT is highly parallelizable
Communication Complexity of OT Extension
54
2ℓ Per OT: Bits sent
Yao: ℓ = k = 128 GMW: ℓ = 1, k = 128
Alice Bob Bob Alice
2k
Protocol Optimization: General OT Extension
55
(similar to garbled 3-row reduction)
Specific OT Functionalities
56
Correlated OT Random OT
e.g., for Yao e.g., for GMW
similar to garbled 3-row reduction
Specific OT Functionalities: Correlated OT (C-OT)
57
similar to garbled 3-row reduction
Specific OT Functionalities: Random OT (R-OT)
58
Runtime in s 10 20 30 40 Orig EMT G-OT C-OT R-OT 2T 4T 14,2 14,2 14,2 14,4 29,4 30,5 30,7 2,6 5,0 10,0 10,6 13,9 14,4 20,6 Gigabit LAN WiFi 802.11g
Performance Evaluation: Original Implementation
59
Performance for 10 Mio. OTs on 80-bit strings
Runtime in s 10 20 30 40 Orig EMT G-OT C-OT R-OT 2T 4T 14,2 14,2 14,2 14,4 29,4 30,5 30,7 2,6 5,0 10,0 10,6 13,9 14,4 20,6 Gigabit LAN WiFi 802.11g
Performance Evaluation: Efficient Matrix Transposition
60
Performance for 10 Mio. OTs on 80-bit strings
Runtime in s 10 20 30 40 Orig EMT G-OT C-OT R-OT 2T 4T 14,2 14,2 14,2 14,4 29,4 30,5 30,7 2,6 5,0 10,0 10,6 13,9 14,4 20,6 Gigabit LAN WiFi 802.11g
Performance Evaluation: General OT
61
Performance for 10 Mio. OTs on 80-bit strings
Runtime in s 10 20 30 40 Orig EMT G-OT C-OT R-OT 2T 4T 14,2 14,2 14,2 14,4 29,4 30,5 30,7 2,6 5,0 10,0 10,6 13,9 14,4 20,6 Gigabit LAN WiFi 802.11g
Performance Evaluation: Correlated/Random OT
62
Performance for 10 Mio. OTs on 80-bit strings
Performance Evaluation: Parallelization
63
Runtime in s 10 20 30 40 Orig EMT G-OT C-OT R-OT 2T 4T 14,2 14,2 14,2 14,4 29,4 30,5 30,7 2,6 5,0 10,0 10,6 13,9 14,4 20,6 Gigabit LAN WiFi 802.11g Performance for 10 Mio. OTs on 80-bit strings
Performance Evaluation: Summary
64
Performance for 10 Mio. OTs on 80-bit strings Runtime in s 10 20 30 40 Orig EMT G-OT C-OT R-OT 2T 4T 14,2 14,2 14,2 14,4 29,4 30,5 30,7 2,6 5,0 10,0 10,6 13,9 14,4 20,6 Gigabit LAN WiFi 802.11g
Summary
Part 1: Yao vs. GMW
Part 2: OT extension
65
Bottleneck of today’s secure computation protocols is communication.
EXERCISE 1
Measure speed of crypto operations with the “openssl speed” command and order them according to throughput:
curve)
66
Literature
[ALSZ13] G. Asharov, Y. Lindell, T. Schneider, M. Zohner: More efficient oblivious transfer and extensions for faster secure computation. In ACM CCS’13. [BarniFKLSS09] M. Barni, P. Failla, V. Kolesnikov, R. Lazzeretti, A.-R. Sadeghi, T. Schneider: Secure Evaluation of Private Linear Branching Programs with Medical Applications. In ESORICS’09. [Beaver91] D. Beaver: Efficient multiparty protocols using circuit randomization. In CRYPTO’91. [Beaver95] D. Beaver: Precomputing oblivious transfer. In CRYPTO’95. [BrickellPSW07] J. Brickell, D. E. Porter, V. Shmatikov, E. Witchel. Privacy-preserving remote diagnostics. In ACM CCS’07. [DDKSSZ15] D. Demmler, G. Dessouky, F. Koushanfar, A.-R. Sadeghi, T. Schneider,
[CHKMR12] S. G. Choi, K.-W. Hwang, J. Katz, T. Malkin, D. Rubinstein: Secure multi-party computation of Boolean circuits with applications to privacy in on-line marketplaces. In CT-RSA’12. [Eklundh72] J. O. Eklundh. A fast computer method for matrix transposing. In IEEE Transactions on Computers, 1972. [ErkinFGKLT09] Z. Erkin, M. Franz, J. Guajardo, S. Katzenbeisser, I. Lagendijk, T. Toft: Privacy-preserving face
[GMW87] O. Goldreich, S. Micali, A. Wigderson: How to play any mental game or a completeness theorem for protocols with honest majority. In STOC’87. [IKNP03] Y. Ishai, J. Kilian, K. Nissim, E. Petrank: Extending oblivious transfers efficiently. In CRYPTO’03. [ImpagliazzoRudich89] R. Impagliazzo, S. Rudich. Limits on the provable consequences of one-way permutations. In STOC’89. [NaorPinkas01] M. Naor, B. Pinkas: Efficient oblivious transfer protocols. In SODA’01. [NaorPS99] M. Naor, B. Pinkas, R. Sumner: Privacy preserving auctions and mechanism design. In EC’99. [SHSSK15] E. M. Songhori, S. U. Hussain, A.-R. Sadeghi, T. Schneider, F. Koushanfar: TinyGarble: Highly compressed and scalable sequential garbled circuits. In IEEE S&P’15. [SZ13] T. Schneider, M. Zohner: GMW vs. Yao? Efficient secure two-party computation with low depth circuits. In FC’13. [Troncoso-PastorizaKC07] J. R. Troncoso-Pasoriza, S. Katzenbeisser, M. U. Celik: Privacy preserving error resilient DNA searching through oblivious automata. In ACM CCS’07. [Yao86] A. C. Yao. How to generate and exchange secrets. In FOCS’86.
67