secure multiparty computation
play

Secure Multiparty Computation Introduction to Privacy Preserving - PowerPoint PPT Presentation

CS573 Data Privacy and Security Secure Multiparty Computation Introduction to Privacy Preserving Distributed Data Mining Li Xiong Slides credit: Chris Clifton, Purdue University; Murat Kantarcioglu, UT Dallas Outline Overview Data


  1. CS573 Data Privacy and Security Secure Multiparty Computation – Introduction to Privacy Preserving Distributed Data Mining Li Xiong Slides credit: Chris Clifton, Purdue University; Murat Kantarcioglu, UT Dallas

  2. Outline • Overview • Data partition – Horizontally partitioned – Vertically partitioned • Privacy preserving Distributed Data Mining • Approaches to preserve privacy • Privacy preserving data mining toolkit

  3. Overview • What is Data Mining? – Extracting implicit un-obvious patterns and relationships from a warehoused of data sets. • This information can be useful to increase the efficiency of the organization and aids future plans • Can be done at an organizational level – By Establishing a data Warehouse

  4. Motivation • Huge databases exist in various applications – Medical data – Consumer purchase data – Census data – Communication and media-related data – Data gathered by government agencies • Can these data be utilized? – For medical research – For improving customer service – For homeland security 4

  5. Motivation • Data sharing is necessary for full utilization • Pooling medical data can improve the quality of medical research • The huge amount of data available means that it is possible to learn a lot of information about individuals from public data – Purchasing patterns – Family history – Medical data – … 5

  6. Horizontally Partitioned Data • Data can be unioned to create the complete set key X1…Xd K1 k2 kn key X1…Xd key X1…Xd key X1…Xd K i+1 K m+1 K1 k i+2 k m+2 k2 kj kn ki Site 1 Site 2 … Site r

  7. Vertically Partitioned Data • Data can be joined to create the complete set key X1…Xi Xi+1… Xj … Xm+1… Xd key X1…Xi key Xi+1…Xj key Xm+1…Xd Site 1 Site 2 … Site r

  8. Distributed Data Mining • The setting: – Data is distributed at different sites – These sites may be third parties (e.g., hospitals, government x1 x2 bodies) or may be the individual him or herself f(x1,x2,…, xn) xn x3

  9. Distributed Data Mining • Government / public agencies. Example: – The Centers for Disease Control want to identify disease outbreaks – Insurance companies have data on disease incidents, seriousness, patient background, etc. – But can/should they release this information? • Industry Collaborations / Trade Groups. Example: – An industry trade group may want to identify best practices to help members – But some practices are trade secrets – How do we provide “commodity” results to all (Manufacturing using chemical supplies from supplier X have high failure rates), while still preserving secrets (manufacturing process Y gives low failure rates)?

  10. Privacy and Security Restrictions • Individual Privacy – Nobody should know more about any entity after the data mining than they did before • Organization Privacy – Protect knowledge about a collection of entities • Individual entity values may be known to all parties • Which entities are at which site may be secret

  11. Privacy-Preserving Distributed Data Mining: Why ? • Data needed for data mining maybe distributed among parties – Credit card fraud data • Inability to share data due to privacy reasons – HIPPAA • Even partial results may need to be kept private

  12. Approaches to preserve privacy • Restrict Access to data (Protect Individual records) • Protect both the data and its source: – Secure Multi-party computation (SMC) – Input Data Randomization • There is no such one solution that fits all purposes

  13. Secure computation and privacy • Secure computation – Assume that there is a function that all parties wish to compute – Secure computation shows how to compute that function in the safest way possible – In particular, it guarantees minimal information leakage (the output only) • Privacy – Does the function output itself reveal “sensitive information”, or – Should the parties agree to compute this function? 13

  14. Secure Multi-Party Computation (SMC) • The goal is computing a function 𝑔(𝑦 1 , 𝑦 2 , … , 𝑦 𝑜 ) without revealing 𝑦 𝑗 • Semi-Honest Model – Parties follow the protocol • Malicious Model – Parties may or may not follow the protocol • We cannot do better then the existence of the third trusted party situation • Generic SMC is too inefficient for PPDDM

  15. Secure Multiparty Computation • Basic cryptographic tools – Oblivious transfer – Random shares – Oblivious circuit evaluation • Yao’s Millionaire’s problem (Yao ’86) – Secure computation possible if function can be represented as a circuit • Works for multiple parties as well (Goldreich, Micali, and Wigderson ’87)

  16. But we aren’t done yet • Circuit evaluation: Build a circuit that represents the computation – For all possible inputs – Impossibly large for typical data mining tasks • Next step: – Efficient techniques for specialized tasks and computations – Tradeoff between security, efficiency, and accuracy

  17. Secure computation tasks • Examples: – Authentication protocols – Online payments – Auctions – Elections – Privacy preserving data mining – Essentially any task… 17

  18. Application of SMC to Private Data Mining • Setting – Data is distributed at different sites – These sites may be third parties (e.g., hospitals, government bodies) or individuals • Aim – Compute the data mining algorithm on the data so that nothing but the output is learned – That is, carry out a secure computation 18

  19. Privacy preserving data mining toolkit (Clifton ‘02) • Many different data mining techniques often perform similar computations at various stages (e.g., computing sum, counting the number of items) • Toolkit – simple computations – sum, union, intersection … – assemble them to solve specific mining tasks – association rule mining, bayes classifier, … • The protocols may not be truly secure but more efficient than traditional SMC methods Tools for Privacy Preserving Data Mining, Clifton, 2002

  20. Primitive protocols • Secure functions – Secure sum – Secure union – …

  21. Secure Sum • Distributed data mining algorithms frequently calculate the sum of values from individual sites • Suppose we have s sites 1, … , 𝑡 • Site 𝑚 has an integer 𝑤 𝑚 • The sites want to know the value of 𝑡 𝑔 (𝑤 1 , . . , 𝑤 𝑡 ) = ෍ 𝑤 𝑚 𝑚=1 • Easy: – One site is designated the master site, numbered 1 – Site 𝑚 send 𝑤 𝑚 to party 1 ( 2 ≤ 𝑚 ≤ 𝑡 ) 𝑡 – Site 1 computes 𝑔 (𝑤 1 , . . , 𝑤 𝑡 ) = σ 𝑚=1 𝑤 𝑚 and broadcasts it

  22. Secure Sum • What they don’t like about this : – Site 1 now knows everyone’s values • Privacy constraint: – Site 𝑚 does not wish to reveal 𝑤 𝑚

  23. Secure Sum II • Suppose we have s sites 1, … , 𝑡 • Site 𝑚 has an integer 𝑤 𝑚 • The sites want to know the value of 𝑔 (𝑤 1 , . . , 𝑤 𝑡 ) = 𝑤 1 + … + 𝑤 𝑡 𝑡 • Assume that the value 𝑤 = σ 𝑚=1 𝑤 𝑚 to be computed is known to lie in the range [0. . 𝑜]

  24. Secure Sum II • Site 1: – generates a random number 𝑆 , uniformly chosen from [0..n] – adds R to its local value 𝑤 1 , and sends R + 𝑤 1 𝑛𝑝𝑒 𝑜 to site 2 • 𝐺𝑝𝑠 𝑚 = 2 . . 𝑡 − 1 𝑚−1 𝑤 𝑘 𝑛𝑝𝑒 𝑜 – Site 𝑚 receives 𝑊 = 𝑆 + σ 𝑘=1 – Site 𝑚 then computes 𝑚 • 𝑊 = 𝑆 + σ 𝑘=1 𝑤 𝑘 𝑛𝑝𝑒 𝑜 = 𝑤 𝑚 + 𝑊 𝑛𝑝𝑒 𝑜 – Pass it to site 𝑚 + 1 • Site 𝑡 performs the above step, and sends the result to site 1 • Site 1, knowing 𝑆 , can subtract 𝑆 to get the actual result: (𝑊 − 𝑆) 𝑛𝑝𝑒 𝑜

  25. Secure Sum II

  26. Secure Sum - security • Does not reveal the real number • Is it secure?  Site can collude!  Each site can divide the number into shares, and run the algorithm multiple times with permutated nodes

  27. Secure Union • Useful in DM where each party needs to give rules, frequent itemsets, etc., without revealing the owner • Can be evaluated using SMC methods if the domain of the items is small • Each party creates a binary vector where 1 in the 𝑗 𝑢ℎ entry represents that the party has the 𝑗 𝑢ℎ item • After this point, a simple circuit that 𝑝𝑠’𝑡 the corresponding vectors can be built and it can be securely evaluated using general SM circuit evaluation protocols • However, in data mining the domain of the items is usually large

  28. Secure Union • Consider k parties 𝑄 1 , …, 𝑄 𝑙 having local sets 𝑇 1 , … , 𝑇 𝑙 , we wish to securely compute • 𝑉 = 𝑇 1 ∪ 𝑇 2 ∪ ⋯ ∪ 𝑇 𝑙 • Such that each party only knows 𝑉 and nothing else • Key: Commutative Encryption 𝐹 𝑏 ( 𝐹 𝑐 (x))= 𝐹 𝑐 ( 𝐹 𝑏 (x)) – (𝑒𝑓𝑑𝑠𝑧𝑞𝑢𝑗𝑝𝑜 𝑔𝑣𝑜𝑑𝑢𝑗𝑝𝑜 ℎ𝑏𝑡 𝑢ℎ𝑓 𝑡𝑏𝑛𝑓 𝑞𝑠𝑝𝑞𝑓𝑠𝑢𝑧) • Multiple encryption and decryption operations can be performed over a value without any restriction about the order of these operations

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend