Privacy-preserving statistical analysis Liina Kamm liina@cyber.ee - - PowerPoint PPT Presentation

privacy preserving statistical analysis
SMART_READER_LITE
LIVE PREVIEW

Privacy-preserving statistical analysis Liina Kamm liina@cyber.ee - - PowerPoint PPT Presentation

Privacy-preserving statistical analysis Liina Kamm liina@cyber.ee http://sharemind.cyber.ee/ Rmind: a tool for cryptographically secure statistical analysis Dan Bogdanov, Liina Kamm, Sven Laur and Ville Sokk ePrint report 2014/512 Sharemind


slide-1
SLIDE 1

Privacy-preserving statistical analysis

Liina Kamm liina@cyber.ee http://sharemind.cyber.ee/

slide-2
SLIDE 2

Rmind: a tool for cryptographically secure statistical analysis Dan Bogdanov, Liina Kamm, Sven Laur and Ville Sokk ePrint report 2014/512

slide-3
SLIDE 3

Sharemind

Input parties

IP1 IPk ...

Computing parties

CP1 CP2 CP3

x11 xk1 ... x12 xk2 ... x13 xk3 ...

y1 y3 y2

...

Result parties

RP1 RPl

x1 xk y y

Step 1: secret sharing

  • f inputs

Step 3: reconstruction

  • f results

Step 2: secure multiparty computation

slide-4
SLIDE 4

Necessary functionality

  • Classification, declassification and publishing of values
  • Protected storage of a private value
  • Support for vectors and matrices
  • Integer, Boolean, floating-point arithmetic
  • Division, square root
  • Shuffling
  • Linking
  • Sorting
slide-5
SLIDE 5

Filtering

Filtered attribute in the usual setting

D1 D2 ... Di ... Dn

Database

1 ... 1 ... A t t r i b u t e 1 A t t r i b u t e j A t t r i b u t e 2 ... A t t r i b u t e m ... n elements Attribute j k elements

Filtered attribute in the privacy- preserving setting

Attribute j Mask vector n elements ... ...

slide-6
SLIDE 6

Quantiles (1)

Q(p, [ [~ a] ]) = (1 − ) · [ [aj] ] + · [ [aj+1] ]

where j = b(n 1)pc + 1, finding the j-th elem

d = np b(n 1)pc p. alues, we can either use

slide-7
SLIDE 7

Quantiles (2)

Algorithm 2: Privacy-preserving algorithm for finding the five-number summary of a vector that leaks the size of the selected subset Data: Input data vector [ [~ a] ] and corresponding mask vector [ [~ m] ]. Result: Minimum [ [min] ], lower quartile [ [lq] ], median [ [me] ], upper quartile [ [uq] ], and maximum [ [max] ] of [ [~ a] ] based on the mask vector [ [~ m] ]

1 [

[~ x] ] cut([ [~ a] ], [ [~ m] ])

2 [

[~ b] ] sort([ [~ x] ])

3 [

[min] ] [ [b1] ]

4 [

[max] ] [ [bn] ]

5 [

[lq] ] Q(0.25, [ [~ b] ])

6 [

[me] ] Q(0.5, [ [~ b] ])

7 [

[uq] ] Q(0.75, [ [~ b] ])

8 return ([

[min] ], [ [lq] ], [ [me] ], [ [uq] ], [ [max] ])

slide-8
SLIDE 8

Quantiles (3)

Algorithm 3: Privacy-preserving algorithm for finding the five-number summary of a vector that hides the size of the selected subset. Data: Input data vector [ [~ a] ] of size N and corresponding mask vector [ [~ m] ]. Result: Minimum [ [min] ], lower quartile [ [lq] ], median [ [me] ], upper quartile [ [uq] ], and maximum [ [max] ] of [ [~ a] ] based on the mask vector [ [~ m] ]

1 ([

[~ b] ], [ [ ~ m0] ]) sort⇤([ [~ a] ], [ [~ m] ])

2 [

[n] ] sum([ [~ m] ])

3 [

[os] ] N [ [n] ]

4 [

[min] ] [ [b[

[1+os] ]]

]

5 [

[max] ] [ [bN] ]

6 [

[lq] ] Q⇤(0.25, [ [~ a] ], [ [os] ])

7 [

[me] ] Q⇤(0.5, [ [~ a] ], [ [os] ])

8 [

[uq] ] Q⇤(0.75, [ [~ a] ], [ [os] ])

9 return ([

[min] ], [ [lq] ], [ [me] ], [ [uq] ], [ [max] ])

slide-9
SLIDE 9

Descriptive statistics

  • Five number summary and boxplot
  • Histogram, frequency table, heatmap
  • Mean, variance, standard deviation, covariance
slide-10
SLIDE 10

Statistical testing

Public data Data Test statistic p-value Threshold Public data Data Test statistic p-value Threshold Private data Comparison Comparison Public data Data Test statistic Critical test statistic Threshold Private data Comparison Option 1 Option 2 Public data Data Test statistic Comparison Threshold Private data p-value Option 3

slide-11
SLIDE 11

Statistical tests

  • t-test, paired t-test
  • Wilcoxon rank sum test, signed rank test
  • chi-square test
  • Multiple testing correction
  • Bonferroni correction
  • Benjamini-Hochberg procedure
slide-12
SLIDE 12

Linear regression (1)

yj = kXj,k + . . . + 1Xj,1 + 0Xj,0 + "j

  • k independent variables , one dependent variable
  • Want to find such that
  • Minimise the square of residuals
  • Convert the task to its equivalent characterisation in terms of

linear equations

as ~ " = X~ ~ y.

k~ "k2 = k~ y X~ k2

XT X~ = XT~ y .

xk y bi

slide-13
SLIDE 13

Linear regression (2)

  • Simple linear regression (one variable)
  • Matrix inversion (up to four variables)
  • Gaussian elimination with back substitution
  • LU decomposition
  • Conjugate gradient method
slide-14
SLIDE 14

Max Loc

Algorithm 10: maxLoc: Finding the first maximum element and its lo- cation in a vector in a privacy-preserving setting Data: A vector [ [~ a] ] of length n Result: The maximum element [ [b] ] and its location [ [l] ] in the vector

1 Let ⇡(j) be a permutation of indices j ∈ {1, . . . , n} 2 [

[b] ] ← [ [aπ(1)] ] and [ [l] ] ← ⇡(1)

3 for i ∈ {⇡(2), . . . , ⇡(n)} do 4

[ [c] ] ← (

  • [

[aπ(i)] ]

  • > |[

[b] ]|)

5

[ [b] ] ← [ [b] ] − [ [c] ] · [ [b] ] + [ [c] ] · [ [aπ(i)] ]

6

[ [l] ] ← [ [l] ] − [ [c] ] · [ [l] ] + [ [c] ] · ⇡(i)

7 end 8 return ([

[b] ], [ [l] ])

slide-15
SLIDE 15

Rmind demo

slide-16
SLIDE 16

https://sharemind.cyber.ee/ sharemind@cyber.ee