Differential Privacy and Applications Marco Gaboardi Boston - PowerPoint PPT Presentation

Differential Privacy and Applications Marco Gaboardi Boston University  

Private Queries?

Private Queries? medical correlation? y r e u q r e w s n a

Private Queries? Does Joe have cancer? query a n s w e r

Private Queries? Does Joe have cancer?

Anonymization?

Anonymization? medical correlation y r e u q r e w s n a

Anonymization? query ?!? a n s w e r

Attacks on Anonymization   (Narayanan, Shmatikov: Robust De-anonymization of Large Sparse Datasets. IEEE Symposium on Security and Privacy 2008) correlations Additional Data Anonymous Data

A Possible Solution: randomization

Adding noise Noise

Adding noise medical correlation? Noise y r e u q e s i o n + r e w s n a

Adding noise Noise ?!? query a n s w e r + n o i s e

Adding noise Noise ?!?

Data analyst

Privacy vs Utility Privacy Utility

Differential privacy: understanding the mathematical and computational meaning of this trade-off. [Dwork, McSherry, Nissim, Smith, TCC06]

Some Official Users • US Census Bureau - onTheMap, new releases in 2020 • Google - RAPPOR tool for Chrome • Apple - typing statistics reports in devices • Facebook - social science data release • Uber / Amazon / Mozilla / Snapchat • Many startups

The rest of the class • Today: Fundamental of reconstruction attacks and definition of differential privacy. • Tuesday: Basic mechanisms and basic properties of differential privacy and how to support them in programming languages. • Thursday: More advanced mechanisms and their verification. • Friday: Other models and applications.

Today: Fundamental of reconstruction attacks and definition of differential privacy

Data Statistics over Data

Is this data private? D1 D2 D3 D4 D5 D6 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 I1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 1 I2 1 0 1 1 1 0 1 0 1 0 1 0 0 1 0 0 I3 0 1 0 1 1 1 0 1 0 0 0 1 0 0 1 0 I4 1 0 1 0 0 1 1 0 1 1 0 0 0 0 1 1 I5 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 0 I6 0 0 1 1 0 1 1 0 1 1 0 0 1 0 1 0 I7 1 1 0 0 1 0 1 1 1 0 1 0 1 0 0 1 I8 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 I9 0 1 0 0 1 0 1 1 0 1 1 1 0 1 1 0 I10 1 0 1 0 0 1 1 0 0 0 0 0 0 1 0 1 I11 0 1 0 1 1 0 0 1 0 1 0 1 0 1 1 0 I12 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 I13 1 1 1 0 1 1 1 1 0 0 1 0 1 0 1 0 I14 0 1 1 0 0 0 0 1 0 0 0 1 0 0 1 0 I15 0 1 0 1 0 1 1 0 1 0 1 0 1 0 0 1

How about if we also have this data? ID Disease ID Name 1 D1 AMAN 1 I1 Alice 2 D2 Behcet 2 I2 Bob 3 D3 Celiac 3 I3 Cynthia a 4 D4 Dermatitis 4 I4 Dan 5 D5 Evans synd. 5 I5 Eve 6 D6 Fibrosis 6 I6 Frank 7 I7 Guy 7 D7 Graves’ dis. 8 I8 Hannah 8 D8 Henoch-Schonlein 9 I9 Ivan 9 D9 IGA Neph. 10 I10 Jon 10 D10 Juv. Diabetes 11 I11 Ken 11 D11 Kawasaki dis. 12 I12 Lou 12 D12 Lichen planus 13 I13 Mike 13 D13 Myositis 14 I14 Noa 14 D14 Narcolepsy 15 I15 Omer 15 D15 Optic Neuritis

How about this? D1 D2 D3 D4 D5 D6 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 6 7 6 7 8 7 9 7 5 5 1 6 6 5 7 5 ID Disease ID Disease 1 D1 AMAN 9 D9 IGA Neph. 2 D2 Behcet 10 D10 Juv. Diabetes 3 D3 Celiac 11 D11 Kawasaki dis. 4 D4 Dermatitis 12 D12 Lichen planus 5 D5 Evans synd. 13 D13 Myositis 6 D6 Fibrosis 14 D14 Narcolepsy 7 D7 Graves’ dis. 15 D15 Optic Neuritis 8 D8 Henoch-Schonlein

The answers to this kind of questions depend on the additional information we have available

Database • We can think about a database as a list of records from some universe set: D ∈ X n = DB • Sometimes we will think to them as functions D[i] ∈ X • and sometimes we will write elements explicitly (x 1 ,…,x n ) ∈ DB

  (Normalized) Counting Queries • A counting query q : X n → [0,1] is a function counting the proportion of element in a dataset satisfying the predicate:   q : X → {0,1} • In symbols:   q ( D ) = 1 n ∑ q ( D [ i ]) i

Example 1 Let’s consider an arbitrary universe domain X and let’s consider the following predicate for y ∈ X I 1 if y = x q y ( x ) = 0 otherwise we call a point function the associated counting query q y : X n → [0,1] Question: Suppose that we answer all the point function queries for y ∈ X . What well know statistics do we obtain?

Example 1 D1 D2 D3 I1 0 0 0 I2 1 0 1 I3 0 1 0 I4 1 0 1 D ∈ X 10 = X={0,1} 3 I5 0 0 0 I6 0 0 1 I7 1 1 0 I8 0 0 0 I9 0 1 0 I10 1 0 1 0.3 q 000 (D) = .3 q 100 (D) = 0 0.225 q 001 (D) = .1 q 101 (D) = .3 0.15 q 010 (D) = .2 q 110 (D) = .1 0.075 q 011 (D) = 0 q 111 (D) = 0 0 000 001 010 011 100 101 110 111

Example 1 Question: Suppose that we answer all the point function queries for y ∈ X . What well know statistics do we obtain? Answer: Histogram of the database.

Example II Let’s consider an arbitrary ordered universe domain X and let’s consider the following predicate for y ∈ X I 1 if x ≤ y q y ( x ) = 0 otherwise we call a threshold function the associated counting query q y : X n → [0,1] Question: Suppose that we answer all the threshold function queries for y ∈ X . What well know statistics do we obtain?

Example II D1 D2 D3 I1 0 0 0 X={0,1} 3 I2 1 0 1 I3 0 1 0 with order   I4 1 0 1 D ∈ X 10 = given by the   I5 0 0 0 I6 0 0 1 corresponding   I7 1 1 0 binary encoding. I8 0 0 0 I9 0 1 0 I10 1 0 1 1 q 000 (D) = .3 q 100 (D) = .6 0.75 q 001 (D) = .4 q 101 (D) = .9 0.5 q 010 (D) = .6 q 110 (D) = 1 0.25 q 011 (D) = .6 q 111 (D) = 1 0 000 001 010 011 100 101 110 111

Example II Question: Suppose that we answer all the threshold function queries for y ∈ X . What well know statistics do we obtain? Answer: CDF of the database.

Example III Let’s consider the universe domain X={0,1} d and let’s consider the following predicate for an index 1 ≤ j ≤ d q j (x) = x[j] we call an attribute counting function the associated counting query q j : {0,1} n*d → [0,1] Question: Which statistics does correspond to releasing all the attribute counting functions?

Example III D1 D1 D2 D2 D3 D3 I1 I1 0 0 0 0 0 0 I2 I2 1 1 0 0 1 1 I3 I3 0 0 1 1 0 0 I4 1 0 1 I4 1 0 1 D ∈ X 10 = X={0,1} 3 I5 I5 0 0 0 0 0 0 I6 0 0 1 I6 0 0 1 I7 1 1 0 I7 1 1 0 I8 I8 0 0 0 0 0 0 I9 0 1 0 I9 0 1 0 I10 1 0 1 I10 1 0 1 margin 4 3 4 q 1 (D) = .4 q 2 (D) = .3 q 3 (D) = .4

Example III Question: Which statistics does correspond to releasing all the attribute counting functions? Answer: (1-way) Marginals of the database

⃗ ⃗ Example IV Let’s consider the universe domain X={0,1} d and let’s consider 2, …, d , ¯ v ∈ List [ k ]{1,¯ 1,2,¯ and d } q v ( x ) = q v 1 ( x ) ∧ q v 2 ( x ) ∧ ⋯ ∧ q v k ( x ) j ( x ) = ¬ x j f and d q ¯ where as q j ( x ) = x j an ounting query q : conjunction the c We call a conjunction or k-way marginal the associated counting query q j : {0,1} n*d → [0,1] Question: Which statistics does correspond to releasing conjunctions?

Example IV D1 D2 D3 I1 0 0 0 I2 1 0 1 I3 0 1 0 I4 1 0 1 D ∈ X 10 = X={0,1} 3 I5 0 0 0 I6 0 0 1 I7 1 1 0 I8 0 0 0 I9 0 1 0 I10 1 0 1 k=2 q 12 (D) = .1 q /12 (D) = .2 D1 /D1 q 1/2 (D) = .3 q /13 (D) = .1 D2 0.1 0.2 q 13 (D) = .3 q /1/2 (D) = .4 /D2 0.3 0.4 q 1/3 (D) = .1 q /1/3 (D) = .5

Example IV Question: Which statistics does correspond to releasing conjunctions? Answer: contingency tables

Linear Queries • A linear query q : X n → [0,1] is a function averagint the value of a function   q : X → [0,1] over the elements of the dataset. • In symbols:   q ( D ) = ∑ 1 nq ( D [ i ]) i

  Sum queries • Let’s denote by I ⊆ ℕ a subset of the rows of the dataset. • A sum query q I : List(X) → ℕ is defined as   q I ( D ) = ∑ D [ i ] i ∈ I

Example D1 D2 D3 I1 0 0 0 I2 1 0 1 I3 0 1 0 I4 1 0 1 X=List[3]{0,1} D = I5 0 0 0 I6 0 0 1 I7 1 1 0 I8 0 0 0 I9 0 1 0 I10 1 0 1 q {1,2,3} (D) = (1,1,1) q {1,2,4} (D) = (2,0,2) q {5,8} (D) = (0,0,0) q {2,4,7,10} (D) = (4,1,3)

Question: Is releasing the result of sum (counting or linear) queries private?

Question: How can we make statistical queries private?

Private Statistical database statistical Noise query answer+noise Question: What kind of noise?

Additive Noise Perturbation • We say that M is a privacy mechanism obtained by adding noise if for every query q, M creates a new randomized query: q*(D) = q(D) + Y • We say that a mechanism M add noise within   perturbation E iff for every q and every D: |q*(D)-q(D)| ≤ E

Question: Does this approach protect privacy?

Reconstruction attack q1 Attacker Noise q2 D … qk D’

Differential Privacy and Applications Marco Gaboardi Boston - PowerPoint PPT Presentation

Differential Privacy and Applications Marco Gaboardi Boston University Data Private Queries? Private Queries? medical correlation? y r e u q r e w s n a Private Queries? Does Joe have cancer? query a n s w e r

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

Toniann Pitassi Outline 1. Differential Privacy: The Basics 2. Differential Privacy in New

Differential Privacy Techniques Beyond Differential Privacy Steven Wu Assistant Professor

CS573 Data Privacy and Security Local Differential Privacy Li Xiong Privacy at Scale: Local

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

Differential Privacy (Part III) Approximate (or ( , ))-differential privacy

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOLATILES DIFFERENTIAL AROMA

Differential privacy and applications to location privacy Catuscia Palamidessi INRIA & Ecole

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

Data Mining with Differential Privacy Arik Friedman and Assal Schuster by Slawomir Goryczka

Mobile Data Collection and Analysis with Local Differential Privacy - Part 1 Ninghui Li (Purdue

-Liftings for Differential Privacy and f -Divergences Gilles Barthe, Thomas Espitau, Justin

2016 Second Quarter Update August 3, 2016 Safe Harbor Statement Some of what well discuss

EARNINGS CALL October 31, 2013 Forward Looking Statements Please be advised that except for

Earnings Conference Call Fourth Quarter and Full Year 2014 January 27, 2015 Cautionary

CONSOL Energy Inc. Second Quarter 2012 Earnings Call J. Brett Harvey, Chairman and CEO

The Future of IntegrOmics Kristel Van Steen, PhD 2 kristel.vansteen@ulg.ac.be Systems and

Reducing Workplace Impairment Ray Baker MD Associate Clinical Professor UBC Medicine IMA

Synchronization on complex networks A model for neural networks Janusz Meylahn Mathematical

Facial Reconstruction The Sublime Beauty of Normal P. Daniel Knott, MD FACS Facial Plastic and

Differential Privacy and Applications Marco Gaboardi Boston - PowerPoint PPT Presentation

Differential Privacy and Applications Marco Gaboardi Boston University Data Private Queries? Private Queries? medical correlation? y r e u q r e w s n a Private Queries? Does Joe have cancer? query a n s w e r

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

Toniann Pitassi Outline 1. Differential Privacy: The Basics 2. Differential Privacy in New

Differential Privacy Techniques Beyond Differential Privacy Steven Wu Assistant Professor

CS573 Data Privacy and Security Local Differential Privacy Li Xiong Privacy at Scale: Local

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

Differential Privacy (Part III) Approximate (or ( , ))-differential privacy

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOLATILES DIFFERENTIAL AROMA

Differential privacy and applications to location privacy Catuscia Palamidessi INRIA &amp; Ecole

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

Data Mining with Differential Privacy Arik Friedman and Assal Schuster by Slawomir Goryczka

Mobile Data Collection and Analysis with Local Differential Privacy - Part 1 Ninghui Li (Purdue

-Liftings for Differential Privacy and f -Divergences Gilles Barthe, Thomas Espitau, Justin

2016 Second Quarter Update August 3, 2016 Safe Harbor Statement Some of what well discuss

EARNINGS CALL October 31, 2013 Forward Looking Statements Please be advised that except for

Earnings Conference Call Fourth Quarter and Full Year 2014 January 27, 2015 Cautionary

CONSOL Energy Inc. Second Quarter 2012 Earnings Call J. Brett Harvey, Chairman and CEO

The Future of IntegrOmics Kristel Van Steen, PhD 2 kristel.vansteen@ulg.ac.be Systems and

Reducing Workplace Impairment Ray Baker MD Associate Clinical Professor UBC Medicine IMA

Synchronization on complex networks A model for neural networks Janusz Meylahn Mathematical

Facial Reconstruction The Sublime Beauty of Normal P. Daniel Knott, MD FACS Facial Plastic and

Differential privacy and applications to location privacy Catuscia Palamidessi INRIA & Ecole