Secure Linear Regression on Secure Linear Regression on Vertically - - PowerPoint PPT Presentation

secure linear regression on secure linear regression on
SMART_READER_LITE
LIVE PREVIEW

Secure Linear Regression on Secure Linear Regression on Vertically - - PowerPoint PPT Presentation

Secure Linear Regression on Secure Linear Regression on Vertically Partitioned Datasets Vertically Partitioned Datasets Adria Gascon Phillipp Schoppmann Borja Balle Mariana Raykova Samee Zahur Jack Doerner David Evans Cryptography in the


slide-1
SLIDE 1

Secure Linear Regression on Vertically Partitioned Datasets Secure Linear Regression on Vertically Partitioned Datasets

Adria Gascon Phillipp Schoppmann Borja Balle Mariana Raykova Samee Zahur Jack Doerner David Evans

6/18/16 Cryptography in the RAM Computation Model 1

slide-2
SLIDE 2

Predictive Model Predictive Model

  • Given samples (x1, y1), (x2, y2), …, (xn, yn)
  • xi∈ℝd, yi∈ℝ
  • Learn a function f such that f(xi) = yi

6/18/16

Patient Blood Count Heart Conditions Digestive Track … Medicine Effectiveness

RBC WBC

Murmur Arrhyt hmia

Inflamm ation Dyspha gia

… A 3.9 10.0 1 1 B 5.0 4.5 1 1 2 1.5 C 2.5 11 1 1 2 D 4.3 5.3 2 1 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Cryptography in the RAM Computation Model 2

slide-3
SLIDE 3

Linear Regression Linear Regression

  • Given samples (x1, y1), (x2, y2), …, (xn, yn)
  • xi∈ℝd, yi∈ℝ
  • Learn a function f such that f(xi) = yi

6/18/16

Patient Blood Count Heart Conditions Digestive Track … Medicine Effectiveness

RBC WBC

Murmur Arrhyt hmia

Inflamm ation Dyspha gia

… A 3.9 10.0 1 1 B 5.0 4.5 1 1 2 1.5 C 2.5 11 1 1 2 D 4.3 5.3 2 1 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . f is well approximated by a linear map yi ≈ 𝜄T xi

Cryptography in the RAM Computation Model 3

slide-4
SLIDE 4

Secure Computation Secure Computation

  • Shared database - (x1, y1), (x2, y2), …, (xn, yn) do not belong to

the same party

  • Compute 𝜄 securely (yi ≈ 𝜄T xi)

6/18/16

Patient Blood Count Heart Conditions Digestive Track … Medicine Effectiveness

RBC WBC

Murmur Arrhyt hmia

Inflamm ation Dyspha gia

… A 3.9 10.0 1 1 B 5.0 4.5 1 1 2 1.5 C 2.5 11 1 1 2 D 4.3 5.3 2 1 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Cryptography in the RAM Computation Model 4

slide-5
SLIDE 5

Horizontally Partitioned Database Horizontally Partitioned Database

  • Different rows belong to different parties
  • E.g., each patient has their own information

6/18/16

Patient Blood Count Heart Conditions Digestive Track … Medicine Effectiveness

RBC WBC

Murmur Arrhyt hmia

Inflamm ation Dyspha gia

… A 3.9 10.0 1 1 B 5.0 4.5 1 1 2 1.5 C 2.5 11 1 1 2 D 4.3 5.3 2 1 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Cryptography in the RAM Computation Model 5

slide-6
SLIDE 6

Vertically Partitioned Database Vertically Partitioned Database

  • Different columns belong to different parties
  • E.g., different specialized hospitals have different parts of the

information for all patients

6/18/16

Patient Blood Count Heart Conditions Digestive Track … Medicine Effectiveness

RBC WBC

Murmur Arrhyt hmia

Inflamm ation Dyspha gia

… A 3.9 10.0 1 1 B 5.0 4.5 1 1 2 1.5 C 2.5 11 1 1 2 D 4.3 5.3 2 1 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Cryptography in the RAM Computation Model 6

slide-7
SLIDE 7

Ridge Regression Ridge Regression

  • Computing linear model on inputs (x1, y1),…, (xn, yn)
  • xi∈ℝd, yi∈ℝ
  • Optimization formulation
  • Linear System Formulation

6/18/16 Cryptography in the RAM Computation Model 7

slide-8
SLIDE 8

Contributions Contributions

  • Secure computation for ridge regression for vertically

partitioned database

  • Two phase protocol:
  • Phase1 – compute 𝐵 =

$ % XT X + 𝛍𝐽

𝑐 = XT Y

  • Output is additively shared between two parties
  • Phase2 – solve 𝐵𝜄 = 𝑐 where A and b are shared between two

parties

  • Two party and multiparty protocol for Phase1
  • Two party inner product computation
  • Three algorithms for Phase2:
  • Cholesky, LDLT, Conjugate Gradient Descent (CGD)
  • Implementation and evaluation

6/18/16 Cryptography in the RAM Computation Model 8

slide-9
SLIDE 9

Phase 1 Phase 1

  • Compute 𝐵 =

$ % XT X + 𝛍𝐽

𝑐 = XT Y

  • The output is additively shared between two parties
  • Each entry of A is a dot product of the vectors held by two

different parties

  • In the multi-party case too
  • Two party computation of dot product

6/18/16 9 Cryptography in the RAM Computation Model

slide-10
SLIDE 10

Phase 1 Phase 1

  • Architecture – inspired by [NWIJBT13]
  • Two additional semi-honest, non-colluding parties:
  • Crypto Service Provider (CSP) – generates parameters
  • Evaluator – helps for the evaluation of the protocols, has no inputs
  • Our setting

6/18/16 10 Cryptography in the RAM Computation Model

Two Parties Many Parties

slide-11
SLIDE 11

Phase 1 Phase 1

Two Parties Many Parties

6/18/16 11 Cryptography in the RAM Computation Model

a b x, r y, z = 𝒚, 𝒛 - r b’ = b - y a’ = a + x, a’’ = 𝒃, 𝒄′ - r - rA rB = 𝒃′, 𝒛 + a’’- z rA

Dot product protocol Garb Circuit OT OT Garb. labels Garb. labels

slide-12
SLIDE 12

Phase 2 Phase 2

  • Two party protocol
  • Inputs: additive shares of matrix A and vector b
  • Outputs: additive shares of 𝜾 such that

𝑩𝜾 = 𝒄

  • Gabled circuits computation
  • Solutions algorithms
  • Two exact algorithms: Cholesky, LDLT
  • One approximation algorithm: Conjugate Gradient

Descent (CGD)

  • [NWIJBT13] implements Cholesky

6/18/16 12 Cryptography in the RAM Computation Model

slide-13
SLIDE 13

Cholesky Cholesky

6/18/16 13

  • Cholesky decomposition for

positive definite matrices

  • A = LLT
  • L: d×d lower triangular matrix
  • Idea: solve LLT𝜾 = 𝒄
  • L𝜾′ = 𝒄
  • LT𝜾 = 𝜾′
  • Complexity: O(d3) floating

point operations

  • Two properties:
  • Data-agnostic – no pivoting
  • Numerically robust – suitable

for finite precision implementations

Cryptography in the RAM Computation Model

forward substitution backward substitution

slide-14
SLIDE 14

LDLT LDLT

6/18/16 14

  • Variant of Cholesky

decomposition

  • A = LDLT
  • L – lower triangular
  • D – diagonal, non-negative

entries

  • Idea: solve LDLT𝜾 = 𝒄
  • L𝜾” = 𝒄
  • D𝜾′ = 𝜾”
  • LT𝜾 = 𝜾′
  • Complexity: O(d3)
  • No square root
  • Additional substitution phase
  • Same properties

Cryptography in the RAM Computation Model

slide-15
SLIDE 15

CGD CGD

6/18/16 15

  • Approximate solution
  • Solving 𝐵𝜄 = 𝑐 by solving

the optimization 𝐛𝐬𝐡𝐧𝐣𝐨𝜾 ||𝑩𝜾 − 𝒄||𝟑

  • Iterative solutions

approach based on conjugate gradients

  • Complexity
  • Until convergence O(d3)
  • Early termination O(d2)

per iteration

  • Error: ε after 𝑷( 𝝺 𝐦𝐩𝐡 1/ε)

iterations

  • 𝞴 - condition number

Cryptography in the RAM Computation Model

slide-16
SLIDE 16

Fixed-Point Arithmetic Fixed-Point Arithmetic

  • 𝜚F 𝑠 = [𝑠/𝜀]; 𝜚

JF 𝑨 = 𝑨𝜀, |𝑠 − 𝜚 JF 𝜚F 𝑠 | ≤ 𝜀

  • 𝜒 𝑨 = 𝑨 if z ≥ 0 ; 𝜒 𝑨 = 𝑨 + 𝑟 if z < 0
  • 𝜒

T 𝑣 = 𝑣 if 0 ≤ u ≤ q/2; 𝜒 T 𝑣 = 𝑣 − 𝑟 if 𝑟/2 < u ≤ q − 1

  • Phase1: n-dim vectors with entries of size R
  • Error: n(2R𝜀+ 𝜀2)
  • Normalize R≤ 𝟐/ 𝒐 ⇒ error ε with 𝜀= ε /2 𝑜 and q = 8n/ ε2
  • O(log(n/ ε)) bit representation
  • Phase2 – experiments
  • q = 232 (4 bits integer part, 1 bit sign) ⇒ 𝜀 = 2-27
  • q = 264 (4 bits integer part, 1 bit sign) ⇒ 𝜀 = 2-59

6/18/16 16 Cryptography in the RAM Computation Model

R

φδ

  • ˜

φδ

Z

ϕq

  • ˜

ϕq

Zq

slide-17
SLIDE 17

Implementation and Evaluation Implementation and Evaluation

  • Obliv-C
  • Most recent optimizations: Free XOR, Garbled Row Reduction,

Fixed Key Block Ciphers, Half Gates

  • Fixed point arithmetic on top of Obliv-C
  • Algorithms: multiplication (Karatsuba-Comba), division

(Knuth’s algorithm D), square root(Newton’s method)

  • 32 bits: 4 bits (integral part) + 28 bit (fractional part)
  • Synthetic datasets (vs real datasets)
  • Generated with correct 𝛍 parameter – sample from d-

dimensional Gaussian distribution

  • Tuning 𝛍 privately is hard question – incorrect 𝛍 makes the
  • ptimization too easy or too difficult
  • Amazon EC2 C4 (15GB RAM, 8 CPU cores)

6/18/16 17 Cryptography in the RAM Computation Model

slide-18
SLIDE 18

Phase 1 Phase 1

2 3 4 Number of parties 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Normalized computation time

Trusted Initializer Parties (average)

6/18/16 18 Cryptography in the RAM Computation Model

Database partitioned equally among parties ( n , d) column1 ( 2000, 20) column2 (10000,100) column3 (50000,500)

Number of parties d 2 3 4 20 0.17 0.033 0.22 0.032 0.26 0.030 100 19 1.7 26 1.6 29 1.4 500 109 146 149 125 166 104

slide-19
SLIDE 19

Phase 2 Phase 2

6/18/16 19 Cryptography in the RAM Computation Model

101 102 size d 106 107 108 109 1010 1011 circuit size

CGD 1 CGD 10 CGD 15 Cholesky

slide-20
SLIDE 20

Phase 2 Phase 2

6/18/16 20 Cryptography in the RAM Computation Model

Convergence of CGD Fixed vs Floating Point

slide-21
SLIDE 21

Conclusions Conclusions

  • Machine learning algorithms – target for MPC
  • Ridge regression
  • Vertically partitioned datasets
  • Tailored protocol for Phase1
  • Two party computation for solving systems of linear

equations for Phase2

  • Exact (Cholesky, LDLT) and approximation (CGD)

algorithms

  • Approximation: more efficient with sufficient precision
  • Next steps – classification (logistic regression)

6/18/16 21 Cryptography in the RAM Computation Model

slide-22
SLIDE 22

6/18/16 22 Cryptography in the RAM Computation Model

Thank You!