Optimizing MPC for robust and scalable integer and floating-point - - PowerPoint PPT Presentation

optimizing mpc for robust and scalable integer and
SMART_READER_LITE
LIVE PREVIEW

Optimizing MPC for robust and scalable integer and floating-point - - PowerPoint PPT Presentation

Optimizing MPC for robust and scalable integer and floating-point arithmetic Liisi Kerik * Peeter Laud * Jaak Randmets * * Cybernetica AS University of Tartu, Institute of Computer Science January 30, 2016 Introduction Secure


slide-1
SLIDE 1

Optimizing MPC for robust and scalable integer and floating-point arithmetic

Liisi Kerik* Peeter Laud* Jaak Randmets*†

* Cybernetica AS † University of Tartu, Institute of Computer Science

January 30, 2016

slide-2
SLIDE 2

Introduction

  • Secure multiparty computation (SMC)
  • Examples: Yao, Income study
  • Most applications have been run on small data volumes.
  • Only one deployment processing tens of millions of

education and income records.

  • Performance is a major hurdle.
  • In this talk will show that SMC can be scalable and

robust.

1/15

slide-3
SLIDE 3

Overview of the talk

  • Background
  • Improvements in floating-point protocols
  • Generic optimization techniques
  • Performance results

2/15

slide-4
SLIDE 4

Secret sharing

  • We mostly use additive 3-party secret-sharing:

v = (v1 + v2 + v3) mod N .

  • Private values are denoted with v.
  • Integer addition w = u + v is local:

wi = ui + vi mod N .

  • We build integer and floating-point arithmetic on top of

this representation.

3/15

slide-5
SLIDE 5

Representing floating-point numbers

x = (−1)s · f · 2e

  • Sign bit s is 0 for positive and 1 for negative numbers.
  • Significand f ∈ [0.5, 1) is represented as a fixed-point

number with 0 bits before radix point.

  • e is the exponent (with range identical to that of the

IEEE float).

4/15

slide-6
SLIDE 6

Primitive protocols

  • Extend(u, n) casts u ∈ Z2m to equal value in Z2n+m.
  • Cut(u, n) drops n least-significant bits of u ∈ Z2m.
  • can be used to implement division by power-of-two
  • MultArr(u, {vi}k

i=1) multiplies point-wise.

  • more efficient than multiplying u with every vi

5/15

slide-7
SLIDE 7

Polynomial evaluation

  • Floating-point functions we approximate with

polynomials: sqrt, sin, exp, ln, erf.

  • Polynomial evaluation requires additions. Floating-point

additions are expensive due to private shifts. Fixed-point polynomials can be computed much faster.

  • We have improved fixed-point polynomial evaluation.
  • Efficiency improvements for polynomial of degree 16 on a

64-bit fixed-point number:

  • old: 89 rounds, 27 KB of communication.
  • new: 57 rounds, 7.5 KB of communication.

6/15

slide-8
SLIDE 8

Improvements in precision

Relative errors of inverse and square root Old New inv32 1.3 · 10−4 2.69 · 10−9 inv64 1.3 · 10−8 7.10 · 10−19 sqrt32 5.1 · 10−6 4.92 · 10−9 sqrt64 4.1 · 10−11 1.30 · 10−15

7/15

slide-9
SLIDE 9

Hacks for faster polynomial evaluation

  • Restrict domain and range to [0, 1). (Coefficients can still

be of any size.)

  • If we know the argument is in range [2−nk, 2−n(k + 1)),

then instead of interpolating f(x) in range [2−nk, 2−n(k + 1)) we interpolate f(2−n(x + k)) in range [0, 1). Smaller coefficients and better precision.

  • We add a small linear term to the function we interpolate.

Gets rid of denormalized results and overflows.

  • Instead of using ordinary fixed-point multiplications

(extend, multiply, cut), we extend the argument sufficiently in the beginning and later only perform multiplications and cuts.

  • In the end, instead of cutting the excess bits and adding

the terms, we add the terms and then cut.

8/15

slide-10
SLIDE 10

Powers of a fixed-point number

Data: x (0 bits before, n bits after radix point) Result: { xi}k

i=1 (n′ + n bits before, n bits after radix point) 1 if k = 0 then 2

return {}

3 else 4

l ← ⌈log2 k⌉

5

x1 ← Extend( x, n′ + (l + 1)n)

6

for i ← 0 to l − 1 do

7

{ xj}2i+1

j=2i+1 ← MultArr(

x2i, { xj}2i

j=1) 8

for j ← 1 to 2i+1 do in parallel

9

xj ← Cut( xj, n)

10

return { xi}k

i=1

9/15

slide-11
SLIDE 11

Fixed-point polynomial evaluation

Data: x (0 bits before, n bits after radix point), { ci}k

i=0

(n′ + n bits before, n bits after radix point, highest n bits empty) Result: Sum({ ci · xi}k

i=0) (0 bits before, n bits after radix

point)

1 {

xi}k

i=1 ← PowArr(

x, k, n, n′)

2

z0 ← Share( c0)

3 for i ← 1 to k do in parallel 4

  • zi ←

ci · xi

5 for i ← 0 to k do in parallel 6

  • z′

i ← Trunc(

zi, n′)

7 return Cut(Sum({

z′

i}k i=0), n)

10/15

slide-12
SLIDE 12

New floating-point protocols: sine

Sine

  • Reduce to range (−2π, 2π).
  • sin (−x) = − sin x, sin (x + π) = − sin x,

sin (π/2 − x) = sin (π/2 + x).

  • Polynomial approximation.
  • Near zero we use sin x ≈ x for better precision.

11/15

slide-13
SLIDE 13

New floating-point protocols: logarithm

Logarithm

  • log2(2e · f) = e + log2 f.
  • e + log2 f = (e − 2) + 2(log4 f + 1).

f ∈ [0.5, 1) ⇒ log4 f + 1 ∈ [0.5, 1).

  • Polynomial approximation. (For double precision, two

different polynomials.)

  • The end result is computed through floating-point

addition.

  • Near 1 we use second degree Taylor polynomial.
  • Conversion ln x = ln 2 · log2 x.

12/15

slide-14
SLIDE 14

Generic optimization techniques

slide-15
SLIDE 15

Resharing protocol

Algorithm 1: Resharing protocol. Data: Shared values u ∈ R Result: Shared value w ∈ R such that u = w.

1 All parties Pi perform the following: 2

r ← R

3

Send r to Pp(i)

4

Receive r′ from Pn(i)

5

wi ← ui + (r − r′)

6 return w

  • resharing is used to ensure messages are independent of

inputs and outputs

  • All protocols and sub-protocols reshare their inputs.

14/15

slide-16
SLIDE 16

Shared random number generators

  • A common pattern: generate a random number and send

it to some other party.

  • We can instead use a common random number generator.
  • We automatically perform this optimization (mostly).
  • Performance improvements:
  • reduced network communication by 30% to 60%
  • improved runtime performance by up to 60%
  • Automatic optimization.

15/15

slide-17
SLIDE 17

Multiplication protocol

Algorithm 2: Multiplication protocol. Data: Shared values u, v ∈ R Result: Shared value w ∈ R such that u · v = w.

1 u ← Reshare(u) 2 v ← Reshare(v) 3 All parties Pi perform the following: 4

Send ui and vi to Pn(i)

5

Receive up(i) and vp(i) from Pp(i)

6

wi ← ui · vi + up(i) · vi + ui · vp(i)

7 w ← Reshare(w) 8 return w

16/15

slide-18
SLIDE 18

Multiplication protocol

1 2 3 ×2 ×3 ×2 ×3 ×2 ×3

17/15

slide-19
SLIDE 19

Multiplication protocol

1 2 3 ×2 ×3 ×2 ×3 ×2 ×3

17/15

slide-20
SLIDE 20

Communication symmetric multiplication

Algorithm 3: Symmetric multiplication protocol. Data: Shared values u, v ∈ R Result: Shared value w ∈ R such that u · v = w.

1 u ← Reshare(u) 2 v ← Reshare(v) 3 All parties Pi perform the following: 4

Send ui to Pn(i) and vi to Pp(i)

5

Receive up(i) from Pp(i) and vn(i) from Pn(i)

6

wi ← ui · vi + up(i) · vi + up(i) · vn(i)

7 w ← Reshare(w) 8 return w

18/15

slide-21
SLIDE 21

Balanced communication

1 2 3

19/15

slide-22
SLIDE 22

Conclusions

  • Performance evaluation on up to 109 element vectors and

up to 1000 repeats.

  • Demonstrates scalability and robustness.
  • Memory limitations at 1010.

Results

  • Can perform 22 million 32-bit integer multiplication per
  • second. Previous published best was 8 million.
  • Late generation Intel i486 (1992).
  • Up to 230 kFLOPS – Intel 80387 (1987).

20/15

slide-23
SLIDE 23