Privacy-preserving statistical analysis Liina Kamm liina@cyber.ee - PowerPoint PPT Presentation

Privacy-preserving statistical analysis Liina Kamm liina@cyber.ee http://sharemind.cyber.ee/

Rmind: a tool for cryptographically secure statistical analysis Dan Bogdanov, Liina Kamm, Sven Laur and Ville Sokk ePrint report 2014/512

Sharemind Input Computing Result parties parties parties x 11 CP 1 y 1 IP 1 ... RP 1 x 1 y x k1 ... x 12 ... CP 2 ... y 2 x k2 x 13 IP k RP l x k y CP 3 ... y 3 x k3 Step 1: Step 2: Step 3: secret sharing secure multiparty reconstruction of inputs computation of results

Necessary functionality • Classification, declassification and publishing of values • Protected storage of a private value • Support for vectors and matrices • Integer, Boolean, floating-point arithmetic • Division, square root • Shuffling • Linking • Sorting

Filtering Database Filtered attribute in the A A A A usual setting t t t t t t t r t r r r i i i b b i b b u u u u t Attribute j t t e t e e e ... m ... 1 2 k elements j D1 D2 Filtered attribute in the privacy- n elements ... ... preserving setting ... Di Attribute j ... ... ... Mask vector 1 0 1 0 n elements Dn

Quantiles (1) Q ( p, [ [ ~ a ] ]) = (1 − � ) · [ [ a j ] ] + � · [ [ a j +1 ] ] where j = b ( n � 1) p c + 1, finding the j -th elem d � = np � b ( n � 1) p c � p . alues, we can either use

Quantiles (2) Algorithm 2: Privacy-preserving algorithm for finding the five-number summary of a vector that leaks the size of the selected subset Data : Input data vector [ [ ~ a ] ] and corresponding mask vector [ [ ~ m ] ]. Result : Minimum [ [ min ] ], lower quartile [ [ lq ] ], median [ [ me ] ], upper quartile [ [ uq ] ], and maximum [ [ max ] ] of [ [ ~ a ] ] based on the mask vector [ [ ~ m ] ] 1 [ [ ~ x ] ] cut ([ [ ~ a ] ] , [ [ ~ m ] ]) [ ~ 2 [ b ] ] sort ([ [ ~ x ] ]) 3 [ [ min ] ] [ [ b 1 ] ] 4 [ [ max ] ] [ [ b n ] ] [ ~ 5 [ [ lq ] ] Q (0 . 25 , [ b ] ]) [ ~ 6 [ [ me ] ] Q (0 . 5 , [ b ] ]) [ ~ 7 [ [ uq ] ] Q (0 . 75 , [ b ] ]) 8 return ([ [ min ] ] , [ [ lq ] ] , [ [ me ] ] , [ [ uq ] ] , [ [ max ] ])

Quantiles (3) Algorithm 3: Privacy-preserving algorithm for finding the five-number summary of a vector that hides the size of the selected subset. Data : Input data vector [ [ ~ a ] ] of size N and corresponding mask vector [ [ ~ m ] ]. Result : Minimum [ [ min ] ], lower quartile [ [ lq ] ], median [ [ me ] ], upper quartile [ [ uq ] ], and maximum [ [ max ] ] of [ [ ~ a ] ] based on the mask vector [ [ ~ m ] ] [ ~ [ ~ 1 ([ b ] ] , [ m 0 ] ]) sort ⇤ ([ [ ~ a ] ] , [ [ ~ m ] ]) 2 [ [ n ] ] sum ([ [ ~ m ] ]) 3 [ [ os ] ] N � [ [ n ] ] 4 [ [ min ] ] [ [ b [ ] ] ] [1+ os ] 5 [ [ max ] ] [ [ b N ] ] ] Q ⇤ (0 . 25 , [ 6 [ [ lq ] [ ~ a ] ] , [ [ os ] ]) ] Q ⇤ (0 . 5 , [ 7 [ [ me ] [ ~ a ] ] , [ [ os ] ]) ] Q ⇤ (0 . 75 , [ 8 [ [ uq ] [ ~ a ] ] , [ [ os ] ]) 9 return ([ [ min ] ] , [ [ lq ] ] , [ [ me ] ] , [ [ uq ] ] , [ [ max ] ])

Descriptive statistics • Five number summary and boxplot • Histogram, frequency table, heatmap • Mean, variance, standard deviation, covariance

Statistical testing Public data Test Data p-value Comparison Threshold statistic Option 1 Private data Public data Test Data p-value Comparison Threshold statistic Option 2 Private data Public data Test Critical test Data Comparison Threshold statistic statistic Option 3 Private data Public data Test Data p-value Comparison Threshold statistic

Statistical tests • t-test, paired t-test • Wilcoxon rank sum test, signed rank test • chi-square test • Multiple testing correction • Bonferroni correction • Benjamini-Hochberg procedure

Linear regression (1) • k independent variables , one dependent variable x k y • Want to find such that b i y j = � k X j,k + . . . + � 1 X j, 1 + � 0 X j, 0 + " j " = X ~ as ~ � � ~ y . " k 2 = k ~ y � X ~ � k 2 • Minimise the square of residuals k ~ • Convert the task to its equivalent characterisation in terms of linear equations X T X ~ � = X T ~ y .

Linear regression (2) • Simple linear regression (one variable) • Matrix inversion (up to four variables) • Gaussian elimination with back substitution • LU decomposition • Conjugate gradient method

Max Loc Algorithm 10: maxLoc : Finding the first maximum element and its location in a vector in a privacy-preserving setting Data : A vector [ [ ~ a ] ] of length n Result : The maximum element [ [ b ] ] and its location [ [ l ] ] in the vector 1 Let ⇡ ( j ) be a permutation of indices j ∈ { 1 , . . . , n } 2 [ [ b ] ] ← [ [ a π (1) ] ] and [ [ l ] ] ← ⇡ (1) 3 for i ∈ { ⇡ (2) , . . . , ⇡ ( n ) } do � > | [ � � [ [ c ] ] ← ( � [ [ a π ( i ) ] ] [ b ] ] | ) 4 [ [ b ] ] ← [ [ b ] ] − [ [ c ] ] · [ [ b ] ] + [ [ c ] ] · [ [ a π ( i ) ] ] 5 [ [ l ] ] ← [ [ l ] ] − [ [ c ] ] · [ [ l ] ] + [ [ c ] ] · ⇡ ( i ) 6 7 end 8 return ([ [ b ] ] , [ [ l ] ])

Rmind demo

https://sharemind.cyber.ee/ sharemind@cyber.ee

Privacy-preserving statistical analysis Liina Kamm liina@cyber.ee - PowerPoint PPT Presentation

Privacy-preserving statistical analysis Liina Kamm liina@cyber.ee http://sharemind.cyber.ee/ Rmind: a tool for cryptographically secure statistical analysis Dan Bogdanov, Liina Kamm, Sven Laur and Ville Sokk ePrint report 2014/512 Sharemind

Privacy Preserving Protocols Workshop on Cryptography for the Internet of Things Jens Hermans KU

FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY

Privacy Preserving Privacy Preserving Netw ork Flow Netw ork Flow Recording Recording Bilal

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Privacy in Wireless Networks privacy notions and metrics; privacy in RFID systems; location

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Privacy preserving data mining randomized response and association rule hiding Li Xiong

Towards Privacy-Preserving Ontology Publishing F. Baader & A. Nuradiansyah Technische

Privacy-Preserving Statistical Data Analysis on Federated Databases Dan Bogdanov Liina Kamm

On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis James Foulds,* Joseph

New Directions in Privacy- preserving Machine Learning Kamalika Chaudhuri University of

Preserving the Privacy of Sensitive Relationships in Graph Data Motivation Valuable Data! No

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

OR/SYST 699 Fall 2012 Faculty Presentation December 7, 2012

HEPI spring conference 15 May 2013 Mark Corver, Head of Analysis and Research, UCAS. T ranscript

2016 - 17 Distance Educatjon Actjvity Courses are offered through Distance Education (DE) at all

DRAFT have flocked to the city. Additionally, lower birth rates have impacted city demographics,

1 Good morning ladies and gentlemen and welcome to our half year results presentation. We are

A Specified Methodology Housing Market Assessment Presentation Purpose, Outputs, Implications

Report on Student-to-Staff Ratios Presentation to the Vermont House REPORT SUMMARY Committee on

The Future of Hig igher Education OSRHE Task Force on on Th The Futu ture of f Hig igher

Privacy-preserving statistical analysis Liina Kamm liina@cyber.ee - PowerPoint PPT Presentation

Privacy-preserving statistical analysis Liina Kamm liina@cyber.ee http://sharemind.cyber.ee/ Rmind: a tool for cryptographically secure statistical analysis Dan Bogdanov, Liina Kamm, Sven Laur and Ville Sokk ePrint report 2014/512 Sharemind

Privacy Preserving Protocols Workshop on Cryptography for the Internet of Things Jens Hermans KU

FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY

Privacy Preserving Privacy Preserving Netw ork Flow Netw ork Flow Recording Recording Bilal

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Privacy in Wireless Networks privacy notions and metrics; privacy in RFID systems; location

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Privacy preserving data mining randomized response and association rule hiding Li Xiong

Towards Privacy-Preserving Ontology Publishing F. Baader &amp; A. Nuradiansyah Technische

Privacy-Preserving Statistical Data Analysis on Federated Databases Dan Bogdanov Liina Kamm

On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis James Foulds,* Joseph

New Directions in Privacy- preserving Machine Learning Kamalika Chaudhuri University of

Preserving the Privacy of Sensitive Relationships in Graph Data Motivation Valuable Data! No

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

OR/SYST 699 Fall 2012 Faculty Presentation December 7, 2012

HEPI spring conference 15 May 2013 Mark Corver, Head of Analysis and Research, UCAS. T ranscript

2016 - 17 Distance Educatjon Actjvity Courses are offered through Distance Education (DE) at all

DRAFT have flocked to the city. Additionally, lower birth rates have impacted city demographics,

1 Good morning ladies and gentlemen and welcome to our half year results presentation. We are

A Specified Methodology Housing Market Assessment Presentation Purpose, Outputs, Implications

Report on Student-to-Staff Ratios Presentation to the Vermont House REPORT SUMMARY Committee on

The Future of Hig igher Education OSRHE Task Force on on Th The Futu ture of f Hig igher

Towards Privacy-Preserving Ontology Publishing F. Baader & A. Nuradiansyah Technische