optimizing mpc for robust and scalable integer and
play

Optimizing MPC for robust and scalable integer and floating-point - PowerPoint PPT Presentation

Optimizing MPC for robust and scalable integer and floating-point arithmetic Liisi Kerik * Peeter Laud * Jaak Randmets * * Cybernetica AS University of Tartu, Institute of Computer Science January 30, 2016 Introduction Secure


  1. Optimizing MPC for robust and scalable integer and floating-point arithmetic Liisi Kerik * Peeter Laud * Jaak Randmets * † * Cybernetica AS † University of Tartu, Institute of Computer Science January 30, 2016

  2. Introduction • Secure multiparty computation (SMC) • Examples: Yao, Income study • Most applications have been run on small data volumes. • Only one deployment processing tens of millions of education and income records. • Performance is a major hurdle. • In this talk will show that SMC can be scalable and robust. 1/15

  3. Overview of the talk • Background • Improvements in floating-point protocols • Generic optimization techniques • Performance results 2/15

  4. Secret sharing • We mostly use additive 3-party secret-sharing: v = ( v 1 + v 2 + v 3 ) mod N . • Private values are denoted with � v � . • Integer addition � w � = � u � + � v � is local: � w � i = � u � i + � v � i mod N . • We build integer and floating-point arithmetic on top of this representation. 3/15

  5. Representing floating-point numbers x = ( − 1) s · f · 2 e • Sign bit s is 0 for positive and 1 for negative numbers. • Significand f ∈ [0 . 5 , 1) is represented as a fixed-point number with 0 bits before radix point. • e is the exponent (with range identical to that of the IEEE float). 4/15

  6. Primitive protocols • Extend( � u � , n ) casts � u � ∈ Z 2 m to equal value in Z 2 n + m . • Cut( � u � , n ) drops n least-significant bits of � u � ∈ Z 2 m . • can be used to implement division by power-of-two • MultArr( � u � , { � v i � } k i =1 ) multiplies point-wise. • more efficient than multiplying � u � with every � v i � 5/15

  7. Polynomial evaluation • Floating-point functions we approximate with polynomials: sqrt, sin, exp, ln, erf. • Polynomial evaluation requires additions. Floating-point additions are expensive due to private shifts. Fixed-point polynomials can be computed much faster. • We have improved fixed-point polynomial evaluation. • Efficiency improvements for polynomial of degree 16 on a 64 -bit fixed-point number: • old: 89 rounds, 27 KB of communication. • new: 57 rounds, 7 . 5 KB of communication. 6/15

  8. Improvements in precision Relative errors of inverse and square root Old New 1 . 3 · 10 − 4 2 . 69 · 10 − 9 inv32 1 . 3 · 10 − 8 7 . 10 · 10 − 19 inv64 5 . 1 · 10 − 6 4 . 92 · 10 − 9 sqrt32 4 . 1 · 10 − 11 1 . 30 · 10 − 15 sqrt64 7/15

  9. Hacks for faster polynomial evaluation • Restrict domain and range to [0 , 1) . (Coefficients can still be of any size.) • If we know the argument is in range [2 − n k, 2 − n ( k + 1)) , then instead of interpolating f ( x ) in range [2 − n k, 2 − n ( k + 1)) we interpolate f (2 − n ( x + k )) in range [0 , 1) . Smaller coefficients and better precision. • We add a small linear term to the function we interpolate. Gets rid of denormalized results and overflows. • Instead of using ordinary fixed-point multiplications (extend, multiply, cut), we extend the argument sufficiently in the beginning and later only perform multiplications and cuts. • In the end, instead of cutting the excess bits and adding the terms, we add the terms and then cut. 8/15

  10. Powers of a fixed-point number Data : � � x � ( 0 bits before, n bits after radix point) i =1 ( n ′ + n bits before, n bits after radix point) Result : { � � x i � } k 1 if k = 0 then return {} 2 3 else l ← ⌈ log 2 k ⌉ 4 x � , n ′ + ( l + 1) n ) � � x 1 � ← Extend( � � 5 for i ← 0 to l − 1 do 6 j =2 i +1 ← MultArr( � � { � � x 2 i � , { � � x j � } 2 i +1 x j � } 2 i j =1 ) 7 for j ← 1 to 2 i +1 do in parallel 8 � � x j � ← Cut( � � x j � , n ) 9 return { � � x i � } k 10 i =1 9/15

  11. Fixed-point polynomial evaluation c i } k Data : � � x � ( 0 bits before, n bits after radix point), { � i =0 ( n ′ + n bits before, n bits after radix point, highest n bits empty) c i · � � x i � } k Result : Sum( { � i =0 ) ( 0 bits before, n bits after radix point) 1 { � � x i � } k x � , k, n, n ′ ) i =1 ← PowArr( � � 2 � � z 0 � ← Share( � c 0 ) 3 for i ← 1 to k do in parallel c i · � � � � z i � ← � x i � 4 5 for i ← 0 to k do in parallel � � z ′ z i � , n ′ ) i � ← Trunc( � � 6 7 return Cut(Sum( { � � i � } k z ′ i =0 ) , n ) 10/15

  12. New floating-point protocols: sine Sine • Reduce to range ( − 2 π, 2 π ) . • sin ( − x ) = − sin x , sin ( x + π ) = − sin x , sin ( π/ 2 − x ) = sin ( π/ 2 + x ) . • Polynomial approximation. • Near zero we use sin x ≈ x for better precision. 11/15

  13. New floating-point protocols: logarithm Logarithm • log 2 (2 e · f ) = e + log 2 f . • e + log 2 f = ( e − 2) + 2(log 4 f + 1) . f ∈ [0 . 5 , 1) ⇒ log 4 f + 1 ∈ [0 . 5 , 1) . • Polynomial approximation. (For double precision, two different polynomials.) • The end result is computed through floating-point addition. • Near 1 we use second degree Taylor polynomial. • Conversion ln x = ln 2 · log 2 x . 12/15

  14. Generic optimization techniques

  15. Resharing protocol Algorithm 1: Resharing protocol. Data : Shared values � u � ∈ R Result : Shared value � w � ∈ R such that u = w . 1 All parties P i perform the following: r ← R 2 Send r to P p ( i ) 3 Receive r ′ from P n ( i ) 4 � w � i ← � u � i + ( r − r ′ ) 5 6 return � w � • resharing is used to ensure messages are independent of inputs and outputs • All protocols and sub-protocols reshare their inputs. 14/15

  16. Shared random number generators • A common pattern: generate a random number and send it to some other party. • We can instead use a common random number generator. • We automatically perform this optimization (mostly). • Performance improvements: • reduced network communication by 30% to 60% • improved runtime performance by up to 60% • Automatic optimization. 15/15

  17. Multiplication protocol Algorithm 2: Multiplication protocol. Data : Shared values � u � , � v � ∈ R Result : Shared value � w � ∈ R such that u · v = w . 1 � u � ← Reshare( � u � ) 2 � v � ← Reshare( � v � ) 3 All parties P i perform the following: Send � u � i and � v � i to P n ( i ) 4 Receive � u � p ( i ) and � v � p ( i ) from P p ( i ) 5 � w � i ← � u � i · � v � i + � u � p ( i ) · � v � i + � u � i · � v � p ( i ) 6 7 � w � ← Reshare( � w � ) 8 return � w � 16/15

  18. Multiplication protocol × 3 2 × 2 × 3 × 2 1 × 2 3 × 3 17/15

  19. Multiplication protocol × 3 2 × 2 × 3 × 2 1 × 2 3 × 3 17/15

  20. Communication symmetric multiplication Algorithm 3: Symmetric multiplication protocol. Data : Shared values � u � , � v � ∈ R Result : Shared value � w � ∈ R such that u · v = w . 1 � u � ← Reshare( � u � ) 2 � v � ← Reshare( � v � ) 3 All parties P i perform the following: Send � u � i to P n ( i ) and � v � i to P p ( i ) 4 Receive � u � p ( i ) from P p ( i ) and � v � n ( i ) from P n ( i ) 5 � w � i ← � u � i · � v � i + � u � p ( i ) · � v � i + � u � p ( i ) · � v � n ( i ) 6 7 � w � ← Reshare( � w � ) 8 return � w � 18/15

  21. Balanced communication 2 1 3 19/15

  22. Conclusions • Performance evaluation on up to 10 9 element vectors and up to 1000 repeats. • Demonstrates scalability and robustness. • Memory limitations at 10 10 . Results • Can perform 22 million 32 -bit integer multiplication per second. Previous published best was 8 million. • Late generation Intel i486 (1992). • Up to 230 kFLOPS – Intel 80387 (1987). 20/15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend