Mobile Data Collection and Analysis with Local Differential Privacy - PowerPoint PPT Presentation

Mobile Data Collection and Analysis with Local Differential Privacy - Part 1 Ninghui Li (Purdue University) 1

Outline • Motivation of Differential Privacy and Local Differential Privacy (LDP) • Frequency Oracles in LDP

Tradeoff between Privacy and Utility A privacy notion for privacy protection guarantee Privacy Utility Design a mechanism under such notion with high utility 6/13/2019 3

AOL Data Release [NYTimes 2006] • In August 2006, AOL Released search keywords of 650,000 users over a 3-month period. • User IDs are replaced by random numbers. • 3 days later, pulled the data from public access. Thelman Arnold, AOL searcher # 4417749 a 62 year old “landscapers in Lilburn, GA” widow who lives NYT queries on last name “Arnold” in Liburn GA, has “ homes sold in shadow lake three dogs, subdivision Gwinnett County, GA” frequently “num fingers” searches her “60 single men” friends’ medical “dog that urinates on everything” ailments. Re-identification occurs! 6/13/2019 4

Differential Privacy [Dwork et al. 2006] • Idea: Any output should be about as likely regardless of whether or not I am in the dataset Def. Algo 𝐵 satisfies 𝜗 -differential 𝐸′ D privacy if for any neighboring D and D ’ and any possible output 𝑢 , 𝑓 −𝜗 ≤ Pr[𝐵 𝐸 =𝑢] Pr[𝐵 𝐸 ′ =𝑢] ≤ 𝑓 𝜗 Parameter 𝜗: strength of privacy 𝐵(𝐸′) 𝐵(𝐸) protection, known as privacy budget. 6/13/2019 5

Key Assumption Behind DP: The Personal Data Principle • After removing one individual’s data , that individual’s privacy is protected perfectly. • Even if correlation can still reveal individual info, that is not considered to be privacy violation • In other words, for each individual, the world after removing the individual’s data is an ideal world of privacy for that individual. Goal is to simulate all these ideal worlds. 6/13/2019 6

Differential Privacy in the Centralized Setting Data mining Database Statistical queries +Noise Differential Privacy Interpretation: Differential Privacy Interpretation: Classical/ The decision to include/exclude an The decision to include/exclude an centralized individual’s record has limited ( 𝜁 ) individual’s record has limited ( 𝜁 ) setting influence on the outcome. influence on the outcome. Smaller 𝜁 ➔ Stronger Privacy Smaller 𝜁 ➔ Stronger Privacy DataData Data Data Data 7

Differential Privacy in the Centralized Setting Data mining Database Statistical queries +Noise Trusted Trust boundary Data Data Data Data Data 8

Local Differential Privacy Data mining Database Statistical queries No worry about untrusted server Data+Noise Data+Noise Data+Noise Trust boundary 9

Outline • Motivation of Differential Privacy and Local Differential Privacy (LDP) • Frequency Oracles in LDP

The Frequency Oracle Protocols under LDP • 𝑑 ≔ 𝐹𝑡𝑢( 𝑧 ) takes reports {𝑧} from all • 𝑧 ≔ 𝑄(𝑤) users and outputs takes input value 𝑤 from 𝑧 estimations 𝑑(𝑤) for any domain 𝐸 and outputs 𝑧 . value 𝑤 in domain 𝐸 FO is 𝜁 -LDP iff ′ for any 𝑤 and 𝑤′ from 𝐸 , and any valid output 𝑧 , Pr 𝑄 𝑤 =𝑧 Pr 𝑄 𝑤′ =𝑧 ≤ 𝑓 𝜁 11

Random Response (Warner’65) • Survey technique for private questions • Survey people: • “Do you a disease?” • Each person: Provide deniability: • Flip a secret coin Seeing answer, not certain about the secret. • Answer truth if head (w/p 0.5 ) • Answer randomly if tail • E.g., a patient will answer “yes” w/p 75%, and “no” w/p 25% • To get unbiased estimation of the distribution: • If 𝑜 𝑤 out of 𝑜 people have the disease, we expect to see 𝐹[ 𝐽 𝑤 ] = 0.75𝑜 𝑤 + 0.25(𝑜 − 𝑜 𝑤 ) “yes” answers • 𝑑(𝑜 𝑤 ) = 𝐽 𝑤 −0.25𝑜 0.75−0.5 is the unbiased estimation of number of patients 6/13/2019 12

Concrete Example An individual will answer “yes” w/p 75%, and “no” w/p 25% truth Expected Expected yes no yes 80 60 20 no 20 5 15 observed 65 35 𝐽 𝑤 −0.25𝑜 𝑑(𝑜 𝑤 ) = estimate 80 20 0.75−0.25 6/13/2019 13

From Two to Any Categories Generalized RAPPOR: Randomized Random Aggregatable Privacy- Response Preserving Ordinal Response. Ú. Erlingsson, V. Pihur, A. Korolova, CCS 2014 Random Unary Response Encoding Local, Private, Efficient Local Locally Differentially Private Protocols Protocols for Succinct for Frequency Estimation T. Wang, J. Hash Histograms R. Bassily, A. Blocki, N. Li, S. Jha: USENIX Security Smith. STOC 2015. 2017 6/13/2019 14

Generalized Random Response • User: Intuitively, the higher 𝑞 , the more accurate Intuitively, the higher 𝑞 , the more accurate • Given v ∈ 𝐸 = {1,2, … , 𝑒} ) • Toss a coin with bias 𝑞 However, when 𝑒 is large, 𝑞 becomes small However, when 𝑒 is large, 𝑞 becomes small • If it is head, report the true value 𝑧 = 𝑤 (for the same 𝜁 ) (for the same 𝜁 ) • Otherwise, report any other value with probability 𝑟 = 1−𝑞 𝑒−1 (uniformly at random) 𝜁 𝒒(𝒆 = 𝟑) 𝒒(𝒆 = 𝟗) 𝒒(𝒆 = 𝟐𝟑𝟗) 𝒒(𝒆 = 𝟐𝟏𝟑𝟓) 𝑓 𝜁 1 Pr 𝑄 𝒘 =𝒘 𝑞 𝑟 = 𝑓 𝜁 • 𝑞 = 𝑓 𝜁 +𝑒−1 , 𝑟 = 𝑓 𝜁 +𝑒−1 ⇒ Pr 𝑄 𝒘′ =𝒘 = 0.1 0.52 0.13 0.016 0.001 • Aggregator: 1 0.73 0.27 0.027 0.002 • Suppose 𝑜 𝑤 users possess value 𝑤, 𝐽 𝑤 is the number of reports 2 0.88 0.51 0.057 0.007 on 𝑤. 4 0.98 0.88 0.307 0.05 • 𝐹[𝐽 𝑤 ] = 𝑜 𝑤 ⋅ 𝑞 + 𝑜 − 𝑜 𝑤 ⋅ 𝑟 • Unbiased Estimation: 𝑑(𝑤) = 𝐽 𝑤 −𝑜⋅𝑟 To get rid of dependency on domain size, To get rid of dependency on domain size, 𝑞−𝑟 we move to the other protocols. we move to the other protocols. 6/13/2019 15

Unary Encoding (Basic RAPPOR) • Encode the value 𝑤 into a bit string 𝒚 ≔ 0, 𝒚 𝑤 ≔ 1 • e.g., 𝐸 = 1,2,3,4 , 𝑤 = 3, then 𝒚 = [0,0,1,0] • Perturb each bit, preserving it with probability 𝑞 𝑓 𝜁/2 1 • 𝑞 1→1 = 𝑞 0→0 = 𝑞 = 𝑞 1→0 = 𝑞 0→1 = 𝑟 = 𝑓 𝜁/2 +1 𝑓 𝜁/2 +1 • ⇒ Pr 𝑄(𝐹 𝑤 )=𝒚 Pr 𝑄(𝐹 𝑤 ′ )=𝒚 ≤ 𝑞 1→1 𝑞 0→1 × 𝑞 0→0 𝑞 1→0 = 𝑓 𝜁 • Since 𝒚 is unary encoding of 𝑤, 𝒚 and 𝒚′ differ in two locations • Intuition: • By unary encoding, each location can only be 0 or 1 , effectively reducing 𝑒 in each location to 2 . (But privacy budget is halved.) • When 𝑒 is large, UE is better than DE. • To estimate frequency of each value, do it for each bit. 6/13/2019 16

Binary Local Hash • The original protocol uses a shared random matrix; this is an equivalent description • Each user uses a random hash function from 𝐸 to 0,1 • The user then perturbs the bit with probabilities 𝑓 𝜁 1 • 𝑞 = 𝑓 𝜁 +1 , 𝑟 = 𝑓 𝜁 +1 ⇒ Pr 𝑄(𝐹 𝒘 ) = 𝑐 Pr 𝑄(𝐹 𝒘 ′ ) = 𝑐 = 𝑞 𝑟 = 𝑓 𝜁 • The user then reports the bit and the hash function • The aggregator increments the reported group • 𝐹[𝐽 𝑤 ] = 𝑜 𝑤 ⋅ 𝑞 + 𝑜 − 𝑜 𝑤 ⋅ ( 1 2 𝑟 + 1 2 𝑞) 𝐽 𝑤 −𝑜⋅ 1 • Unbiased Estimation: 𝑑(𝑤) = 2 𝑞− 1 2 6/13/2019 17

Optimization • We measure utility of a mechanism by its variance • E.g., in Random Response, 𝐽 𝑤 −𝑜⋅𝑟 𝑊𝑏𝑠[𝐽 𝑤 ] 𝑜⋅𝑟⋅(1−𝑟) • 𝑊𝑏𝑠 𝑑 𝑤 = 𝑊𝑏𝑠 = 𝑞−𝑟 2 ≈ 𝑞−𝑟 2 𝑞−𝑟 • We propose a framework called ‘pure’ and cast existing mechanisms into the framework. 𝑛𝑗𝑜 𝑟′ 𝑊𝑏𝑠 𝑑 𝑤 𝑛𝑗𝑜 𝑟′ 𝑊𝑏𝑠 𝑑 𝑤 • Each output 𝑧 “supports” a set of input 𝑤 𝑜⋅𝑟′⋅(1−𝑟′) 𝑜⋅𝑟′⋅(1−𝑟′) or 𝑛𝑗𝑜 𝑟′ or 𝑛𝑗𝑜 𝑟′ • E.g., In Unary Encoding, a binary vector supports each 𝑞′−𝑟 ′2 𝑞′−𝑟 ′2 value with a corresponding 1 where 𝑞′, 𝑟′ satisfy 𝜁 -LDP where 𝑞′, 𝑟′ satisfy 𝜁 -LDP • E.g., In BLH, Support (𝑧) = 𝑤 𝐼 𝑤 = 𝑧 • A pure protocol is specified by 𝑞′ and 𝑟′ • Each input is perturbed into a value “supporting it” with 𝑞 ′ , and into a value not supporting it with 𝑟′ 6/13/2019 18

Frequency Estimation Protocols • Randomised response: a survey technique for eliminating evasive answer bias • S.L. Warner, Journal of Ame. Stat. Ass. 1965 • Direct Encoding (Generalized Random Response) • RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. • Ú. Erlingsson, V. Pihur, A. Korolova, CCS 2014 • Unary Encoding, Encode into a bit-vector • Local, Private, Efficient Protocols for Succinct Histograms • R. Bassily, A. Smith. STOC 2015. • Binary Local Hash: Encode by hashing and then perturb • Locally Differentially Private Protocols for Frequency Estimation • T. Wang, J. Blocki, N. Li, S. Jha: USENIX Security 2017

Mobile Data Collection and Analysis with Local Differential Privacy - PowerPoint PPT Presentation

Mobile Data Collection and Analysis with Local Differential Privacy - Part 1 Ninghui Li (Purdue University) 1 Outline Motivation of Differential Privacy and Local Differential Privacy (LDP) Frequency Oracles in LDP Tradeoff between

Sunglasses SM001 Collection SM005 Collection YPC001 Collection(swimming goggles) SR001

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

Mobile Mapping | Mass Data Collection Traditional methods of data collection can be THE

Conference + Meeting Spaces Salt + Pepper TONON COLLECTION Macs Table TONON COLLECTION Pit

Conference + Meeting Spaces Salt + Pepper TONON COLLECTION Macs Table TONON COLLECTION Pit

Mobile Capabilities And Credentials Contents Mobile Landscape Mobile Functionalities

Evaluating the Accuracy of Data Collection on Mobile Phones: Collection on Mobile Phones: A

Data Collection and HIVe Current Data Collection For those collecting data, you are use to

Digital Tachograph Data Collection & Analysis System 1 Outline Data Collection

Data Collection and Aggregation Data Collection and Aggregation 1 Challenges: data Challenges:

Data Collection International Labour Office Department of Statistics Data Collection data

Data Collection and Data Management saverio . giallorenzo @gmail.com 1 Web Science Data

Digital Tachograph Data Collection & Analysis System 1 Outline Data Collection

CS 528 Mobile and Ubiquitous Computing Lecture 11: Mobile Security and Mobile Software

The Mobile Alabama Cruise Terminal and City of Mobile City of Mobile Alabama Cruise Terminal

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Conflict Detection-based Run-Length Encoding AVX-512 CD Instruction Set in Action Annett

Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads Sadjad

Di Digital Transm smissi ssion on 01204325 Data Communications and Computer Networks Chaipo

Analysis and Improvement of Differential Computation Attacks against Internally-Encoded White-Box

Encoding Meshes in Differential Coordinates Daniel Cohen-Or Tel Aviv University Outline

ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath + , Sudipta

A Theory of Coding for Chip- to-Chip Communica6on Amin

Computational Interpretations of Differential Logic Jim Laird (University of Bath) May 30, 2013

Mobile Data Collection and Analysis with Local Differential Privacy - PowerPoint PPT Presentation

Mobile Data Collection and Analysis with Local Differential Privacy - Part 1 Ninghui Li (Purdue University) 1 Outline Motivation of Differential Privacy and Local Differential Privacy (LDP) Frequency Oracles in LDP Tradeoff between

Sunglasses SM001 Collection SM005 Collection YPC001 Collection(swimming goggles) SR001

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

Mobile Mapping | Mass Data Collection Traditional methods of data collection can be THE

Conference + Meeting Spaces Salt + Pepper TONON COLLECTION Macs Table TONON COLLECTION Pit

Conference + Meeting Spaces Salt + Pepper TONON COLLECTION Macs Table TONON COLLECTION Pit

Mobile Capabilities And Credentials Contents Mobile Landscape Mobile Functionalities

Evaluating the Accuracy of Data Collection on Mobile Phones: Collection on Mobile Phones: A

Data Collection and HIVe Current Data Collection For those collecting data, you are use to

Digital Tachograph Data Collection &amp; Analysis System 1 Outline Data Collection

Data Collection and Aggregation Data Collection and Aggregation 1 Challenges: data Challenges:

Data Collection International Labour Office Department of Statistics Data Collection data

Data Collection and Data Management saverio . giallorenzo @gmail.com 1 Web Science Data

Digital Tachograph Data Collection &amp; Analysis System 1 Outline Data Collection

CS 528 Mobile and Ubiquitous Computing Lecture 11: Mobile Security and Mobile Software

The Mobile Alabama Cruise Terminal and City of Mobile City of Mobile Alabama Cruise Terminal

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Conflict Detection-based Run-Length Encoding AVX-512 CD Instruction Set in Action Annett

Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads Sadjad

Di Digital Transm smissi ssion on 01204325 Data Communications and Computer Networks Chaipo

Analysis and Improvement of Differential Computation Attacks against Internally-Encoded White-Box

Encoding Meshes in Differential Coordinates Daniel Cohen-Or Tel Aviv University Outline

ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath + , Sudipta

A Theory of Coding for Chip- to-Chip Communica6on Amin

Computational Interpretations of Differential Logic Jim Laird (University of Bath) May 30, 2013

Digital Tachograph Data Collection & Analysis System 1 Outline Data Collection

Digital Tachograph Data Collection & Analysis System 1 Outline Data Collection