Midterm Review Li Xiong Department of Mathematics and Computer - PowerPoint PPT Presentation

CS573 Data Privacy and Security Midterm Review Li Xiong Department of Mathematics and Computer Science Emory University

Principles of Data Security – CIA Triad • Confidentiality – Prevent the disclosure of information to unauthorized users • Integrity – Prevent improper modification • Availability – Make data available to legitimate users

Privacy vs. Confidentiality • Confidentiality – Prevent disclosure of information to unauthorized users • Privacy – Prevent disclosure of personal information to unauthorized users – Control of how personal information is collected and used – Prevent identification of individuals 11/8/2016 3

Data Privacy and Security Measures • Access control – Restrict access to the (subset or view of) data to authorized users • Cryptography – Use encryption to encode information so it can be only read by authorized users (protected in transmit and storage) • Inference control – Restrict inference from accessible data to sensitive (non- accessible) data

Inference Control • Inference control : Prevent inference from accessible information to individual information (not accessible) • Technologies – De-identification and Anonymization (input perturbation) – Differential Privacy (output perturbation)

Traditional De-identification and Anonymization • Attribute suppression, encoding, perturbation, generalization • Subject to re-identification and disclosure attacks Sanitized Original De-identification Records Data anonymization

Statistical Data Sharing with Differential Privacy • Macro data (as versus micro data) • Output perturbation (as versus input perturbation) • More rigorous guarantee Statistics/ Original Differentially Private Models/ Data Data Sharing Synthetic Records

Cryptography • Encoding data in a way that only authorized users can read it Encrypted Original Encryption Data Data 11/9/2016 8

Applications of Cryptography • Secure data outsourcing – Support computation and queries on encrypted data Computation Encrypted /Queries Data 9 11/9/2016 9

Applications of Cryptography • Multi-party secure computations (secure function evaluation) – Securely compute a function without revealing private inputs x1 x2 f(x1,x2,…, xn) xn x3 10

Applications of Cryptography • Private information retrieval (access privacy) – Retrieve data without revealing query (access pattern) 11

Course Topics • Inference control – De-identification and anonymization – Differential privacy foundations – Differential privacy applications • Histograms • Data mining • Local differential privacy • Location privacy • Cryptography • Access control • Applications 11/8/2016 12

k-Anonymity Caucas 787XX Flu Caucas 78712 Flu Asian/AfrA Asian 78705 Shingle 78705 Shingle m s s Caucas 787XX Flu Caucas 78754 Flu Asian/AfrA Asian 78705 Acne 78705 Acne m AfrAm 78705 Acne Asian/AfrA 78705 Acne m Caucas 78705 Flu Caucas 787XX Flu Quasi-identifiers (QID) = race, zipcode Sensitive attribute = diagnosis K-anonymity: the size of each QID group is at least k

Problem of k-anonymity Caucas 787XX Flu Asian/AfrA 78705 Shingle … … … m s Rusty Shackleford Caucas 78705 Caucas 787XX Flu … … … Asian/AfrA 78705 Acne m Asian/AfrA 78705 Acne m Caucas 787XX Flu Problem: sensitive attributes are not “diverse” within each quasi-identifier group slide 14

l-Diversity [Machanavajjhala et al. ICDE ‘06] Caucas 787XX Flu Caucas 787XX Shingle s Caucas 787XX Acne Entropy of sensitive attributes Caucas 787XX Flu within each quasi-identifier Caucas 787XX Acne group must be at least l Caucas 787XX Flu Asian/AfrA 78XXX Flu m Asian/AfrA 78XXX Flu m Asian/AfrA 78XXX Acne m Asian/AfrA 78XXX Shingle m s Asian/AfrA 78XXX Acne

Problem with l-diversity Original dataset Anonymization A Anonymization B … HIV- Q1 HIV+ Q1 HIV- … HIV- Q1 HIV- Q1 HIV- … HIV- Q1 HIV+ Q1 HIV- … HIV- Q1 HIV- Q1 HIV+ … HIV- Q1 HIV+ Q1 HIV- … HIV+ Q1 HIV- Q1 HIV- … HIV- Q2 HIV- Q2 HIV- … HIV- Q2 HIV- Q2 HIV- 99% HIV-  quasi-identifier group is not “diverse” … HIV- Q2 HIV- Q2 HIV- …yet anonymized database does not leak anything … HIV- Q2 HIV- Q2 HIV- … HIV- Q2 HIV- Q2 HIV- 50% HIV-  quasi- identifier group is “diverse” … HIV- Q2 HIV- Q2 Flu This leaks a ton of information 99% have HIV-

t-Closeness [Li et al. ICDE ‘07] Caucas 787XX Flu Caucas 787XX Shingle s Distribution of sensitive Caucas 787XX Acne attributes within each quasi-identifier group should Caucas 787XX Flu be “close” to their distribution Caucas 787XX Acne in the entire original database Caucas 787XX Flu Asian/AfrA 78XXX Flu m Asian/AfrA 78XXX Flu m Asian/AfrA 78XXX Acne m Asian/AfrA 78XXX Shingle m s slide 17 Asian/AfrA 78XXX Acne

Problems with Syntactic Privacy notions • Syntactic – Focuses on data transformation, not on what can be learned from the anonymized dataset • “ Quasi- identifier” fallacy – Assumes a priori that attacker will not know certain information about his target – Attacker may know the records in the database or external information slide 18

Course Topics • Inference control – De-identification and anonymization – Differential privacy foundations – Differential privacy applications • Histograms • Data mining • Location privacy • Cryptography • Access control • Applications 11/8/2016 19

Differential Privacy • Statistical outcome is indistinguishable regardless whether a particular user (record) is included in the data

Statistical Data Release: disclosure risk Original records Original histogram

Statistical Data Release: differential privacy Perturbed histogram Original records Original histogram with differential privacy

Differential Privacy D D’ • D and D’ are neighboring databases if they differ in one record A privacy mechanism A gives ε -differential privacy if for all neighbouring databases D , D’ , and for any possible output S ∈ Range(A), Pr[A(D) = S ] ≤ exp(ε ) × Pr[A(D’) = S]

Laplace Mechanism Global Sensitivity Add Laplace noise to the true output f(D) Δ f = max D,D’ | f ( D ) - f ( D’ )|

Example: Laplace Mechanism • For a single counting query Q over a dataset D , returning Q(D)+Laplace(1/ε) gives ε - differential privacy. 11/8/2016 26

Exponential Mechanism Inputs Outputs Sample output r with a utility score function u(D,r)

Exponential Mechanism For a database D, output space R and a utility score function u : D× R → R , the algorithm A Pr[ A ( D ) = r ] ∝ exp (ε × u ( D, r )/ 2Δ u ) satisfies ε -differential privacy, where Δ u is the sensitivity of the utility score function Δ u = max r & D,D’ | u ( D, r ) - u ( D’, r )|

Example: Exponential Mechanism • Scoring/utility function w: Inputs x Outputs  R • D: nationalities of a set of people • f(D) : most frequent nationality in D • u (D, O) = #(D, O) the number of people with nationality O Module 2 Tutorial: Differential Privacy in the Wild 29

Composition theorems Sequential composition Parallel composition ∑ i ε i – differential privacy max( ε i ) – differential privacy

Differential Privacy • Differential privacy ensure an attacker can’t infer the presence or absence of a single record in the input based on any output. • Building blocks – Laplace, exponential mechanism • Composition rules help build complex algorithms using building blocks

Course Topics • Inference control – De-identification and anonymization – Differential privacy foundations – Differential privacy applications • Histograms • Data mining • Location privacy • Cryptography • Access control • Applications 11/8/2016 32

Baseline: Laplace Mechanism • For the counting query Q on each histogram bin, returning Q(D)+Laplace(1/ε) gives ε - differential privacy. 11/8/2016 33

DPCube [SecureDM 2010, ICDE 2012 demo] Name Age Income HIV+ Frank 42 30K Y ε/2 -DP Bob 31 60K Y Mary 28 20K Y … … … … Original Records DP unit Histogram • Compute unit Multi-dimensional histogram counts partitioning with differential ε/2 -DP privacy • Use DP unit histogram for partitioning • Compute V-optimal histogram counts with differential DP V-optimal Histogram privacy DP Interface

Private Spatial decompositions [CPSSY 12] quadtree kd-tree  Need to ensure both partitioning boundary and the counts of each partition are differentially private 35

Histogram methods vs parametric methods Non-parametric methods (only work well for low-dimensional data) Original data Synthetic data Perturbation Histogram Learn empirical distribution through histograms e.g. PSD , Privelet, FP, P-HP Parametric methods (joint distribution difficult to model) Fit the data to a distribution, make inferences about parameters e.g. PrivacyOnTheMap

Midterm Review Li Xiong Department of Mathematics and Computer - PowerPoint PPT Presentation

CS573 Data Privacy and Security Midterm Review Li Xiong Department of Mathematics and Computer Science Emory University Principles of Data Security CIA Triad Confidentiality Prevent the disclosure of information to unauthorized users

Midterm Introduction to Web Design Midterm exam on Tuesday, October 22 Midterm Introduction to

61A Lecture 11 Friday, September 21 Midterm 1 Recap 2 Midterm 1 Recap The exam was more

Midterm 2 Review. Midterm format Modular Arithmetic Inverses and GCD Midterm Topics: Notes 6-14.

CS 401 Midterm review Xiaorui Sun 1 Midterm Exam Midterm exam via gradescope : October 16

Midterm Solutions David M. Rocke BIM 105, Fall 2018 David M. Rocke Midterm Solutions November

Announcements Midterm 2 is Thursday The midterm will cover everything since the first midterm up

CSE 115 Introduction to Computer Science I Midterm Midterm will be returned no later than

Midterm review Midterm: what you need to know Everything weve covered thus far (chapters 1

MIDTERM REVIEW NEXT WEDNESDAY (3/27): IN-CLASS MIDTERM CANNOT MAKE IT? If for some special

MIDTERM REVIEW NEXT MONDAY: IN-CLASS MIDTERM CANNOT MAKE IT? If for some special circumstance,

CSE 461 Midterm Review A quick tour of what we have learned so far Midterm Topic Coverage

Midterm 2 Review Midterm Topics Leader Election Consensus Formulation Synchronous

Lecture 18 Logistics HW7 is due on Monday (and topic included in midterm 2) Midterm 2

Midterm Exam October 20th, Thursday 9:30am-10:50am @215 NSC Chapters included in the Midterm

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Midterm 2 Review Midterm 2 Review

Review for Midterm Review for Midterm EES 3310/5310 EES 3310/5310 Global Climate Change Global

The Congregation regational al Church h of Plai ainvill ville Capital Campaign Sanctuary

Util ilit ity P y Prop oper ertie ies Guide Guidelin ines Chap apter er 9 9 1 Guide

Measuring the Structural Similarity of Semistructured Documents Using Entropy Sven Helmer

SMR in Linux Systems Seagate's Contribution to Legacy File Systems Adrian Palmer, Drive

Data-Intensive Distributed Computing CS 431/631 451/651 (Fall 2019) Part 6: Data Mining (3/4)

Data Leak Detection As a Service Xiaokui Shu and Danfeng (Daphne) Yao Department of Computer

Web Characteristics CE-324: Modern Information Retrieval Sharif University of Technology M.

Data Mining Learning from Large Data Sets Lecture 2