privacy preserving statistical analysis
play

Privacy-preserving statistical analysis Liina Kamm liina@cyber.ee - PowerPoint PPT Presentation

Privacy-preserving statistical analysis Liina Kamm liina@cyber.ee http://sharemind.cyber.ee/ Rmind: a tool for cryptographically secure statistical analysis Dan Bogdanov, Liina Kamm, Sven Laur and Ville Sokk ePrint report 2014/512 Sharemind


  1. Privacy-preserving statistical analysis Liina Kamm liina@cyber.ee http://sharemind.cyber.ee/

  2. Rmind: a tool for cryptographically secure statistical analysis Dan Bogdanov, Liina Kamm, Sven Laur and Ville Sokk ePrint report 2014/512

  3. Sharemind Input Computing Result parties parties parties x 11 CP 1 y 1 IP 1 ... RP 1 x 1 y x k1 ... x 12 ... CP 2 ... y 2 x k2 x 13 IP k RP l x k y CP 3 ... y 3 x k3 Step 1: Step 2: Step 3: secret sharing secure multiparty reconstruction of inputs computation of results

  4. Necessary functionality • Classification, declassification and publishing of values • Protected storage of a private value • Support for vectors and matrices • Integer, Boolean, floating-point arithmetic • Division, square root • Shuffling • Linking • Sorting

  5. Filtering Database Filtered attribute in the A A A A usual setting t t t t t t t r t r r r i i i b b i b b u u u u t Attribute j t t e t e e e ... m ... 1 2 k elements j D1 D2 Filtered attribute in the privacy- n elements ... ... preserving setting ... Di Attribute j ... ... ... Mask vector 1 0 1 0 n elements Dn

  6. Quantiles (1) Q ( p, [ [ ~ a ] ]) = (1 − � ) · [ [ a j ] ] + � · [ [ a j +1 ] ] where j = b ( n � 1) p c + 1, finding the j -th elem d � = np � b ( n � 1) p c � p . alues, we can either use

  7. Quantiles (2) Algorithm 2: Privacy-preserving algorithm for finding the five-number summary of a vector that leaks the size of the selected subset Data : Input data vector [ [ ~ a ] ] and corresponding mask vector [ [ ~ m ] ]. Result : Minimum [ [ min ] ], lower quartile [ [ lq ] ], median [ [ me ] ], upper quartile [ [ uq ] ], and maximum [ [ max ] ] of [ [ ~ a ] ] based on the mask vector [ [ ~ m ] ] 1 [ [ ~ x ] ] cut ([ [ ~ a ] ] , [ [ ~ m ] ]) [ ~ 2 [ b ] ] sort ([ [ ~ x ] ]) 3 [ [ min ] ] [ [ b 1 ] ] 4 [ [ max ] ] [ [ b n ] ] [ ~ 5 [ [ lq ] ] Q (0 . 25 , [ b ] ]) [ ~ 6 [ [ me ] ] Q (0 . 5 , [ b ] ]) [ ~ 7 [ [ uq ] ] Q (0 . 75 , [ b ] ]) 8 return ([ [ min ] ] , [ [ lq ] ] , [ [ me ] ] , [ [ uq ] ] , [ [ max ] ])

  8. Quantiles (3) Algorithm 3: Privacy-preserving algorithm for finding the five-number summary of a vector that hides the size of the selected subset. Data : Input data vector [ [ ~ a ] ] of size N and corresponding mask vector [ [ ~ m ] ]. Result : Minimum [ [ min ] ], lower quartile [ [ lq ] ], median [ [ me ] ], upper quartile [ [ uq ] ], and maximum [ [ max ] ] of [ [ ~ a ] ] based on the mask vector [ [ ~ m ] ] [ ~ [ ~ 1 ([ b ] ] , [ m 0 ] ]) sort ⇤ ([ [ ~ a ] ] , [ [ ~ m ] ]) 2 [ [ n ] ] sum ([ [ ~ m ] ]) 3 [ [ os ] ] N � [ [ n ] ] 4 [ [ min ] ] [ [ b [ ] ] ] [1+ os ] 5 [ [ max ] ] [ [ b N ] ] ] Q ⇤ (0 . 25 , [ 6 [ [ lq ] [ ~ a ] ] , [ [ os ] ]) ] Q ⇤ (0 . 5 , [ 7 [ [ me ] [ ~ a ] ] , [ [ os ] ]) ] Q ⇤ (0 . 75 , [ 8 [ [ uq ] [ ~ a ] ] , [ [ os ] ]) 9 return ([ [ min ] ] , [ [ lq ] ] , [ [ me ] ] , [ [ uq ] ] , [ [ max ] ])

  9. Descriptive statistics • Five number summary and boxplot • Histogram, frequency table, heatmap • Mean, variance, standard deviation, covariance

  10. Statistical testing Public data Test Data p-value Comparison Threshold statistic Option 1 Private data Public data Test Data p-value Comparison Threshold statistic Option 2 Private data Public data Test Critical test Data Comparison Threshold statistic statistic Option 3 Private data Public data Test Data p-value Comparison Threshold statistic

  11. Statistical tests • t-test, paired t-test • Wilcoxon rank sum test, signed rank test • chi-square test • Multiple testing correction • Bonferroni correction • Benjamini-Hochberg procedure

  12. Linear regression (1) • k independent variables , one dependent variable x k y • Want to find such that b i y j = � k X j,k + . . . + � 1 X j, 1 + � 0 X j, 0 + " j " = X ~ as ~ � � ~ y . " k 2 = k ~ y � X ~ � k 2 • Minimise the square of residuals k ~ • Convert the task to its equivalent characterisation in terms of linear equations X T X ~ � = X T ~ y .

  13. Linear regression (2) • Simple linear regression (one variable) • Matrix inversion (up to four variables) • Gaussian elimination with back substitution • LU decomposition • Conjugate gradient method

  14. Max Loc Algorithm 10: maxLoc : Finding the first maximum element and its lo- cation in a vector in a privacy-preserving setting Data : A vector [ [ ~ a ] ] of length n Result : The maximum element [ [ b ] ] and its location [ [ l ] ] in the vector 1 Let ⇡ ( j ) be a permutation of indices j ∈ { 1 , . . . , n } 2 [ [ b ] ] ← [ [ a π (1) ] ] and [ [ l ] ] ← ⇡ (1) 3 for i ∈ { ⇡ (2) , . . . , ⇡ ( n ) } do � > | [ � � [ [ c ] ] ← ( � [ [ a π ( i ) ] ] [ b ] ] | ) 4 [ [ b ] ] ← [ [ b ] ] − [ [ c ] ] · [ [ b ] ] + [ [ c ] ] · [ [ a π ( i ) ] ] 5 [ [ l ] ] ← [ [ l ] ] − [ [ c ] ] · [ [ l ] ] + [ [ c ] ] · ⇡ ( i ) 6 7 end 8 return ([ [ b ] ] , [ [ l ] ])

  15. Rmind demo

  16. https://sharemind.cyber.ee/ sharemind@cyber.ee

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend