Privacy in a Mobile-Social World
CompSci 590.03 Instructor: Ashwin Machanavajjhala
1 Lecture 1 : 590.03 Fall 12
Privacy in a Mobile-Social World CompSci 590.03 Instructor: Ashwin - - PowerPoint PPT Presentation
Privacy in a Mobile-Social World CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 1 : 590.03 Fall 12 1 Administrivia http://www.cs.duke.edu/courses/fall12/compsci590.3/ Wed/Fri 1:25 2:40 PM Reading Course + Project
CompSci 590.03 Instructor: Ashwin Machanavajjhala
1 Lecture 1 : 590.03 Fall 12
http://www.cs.duke.edu/courses/fall12/compsci590.3/
– No exams! – Every class based on 1 (or 2) assigned papers that students must read.
– Individual or groups of size 2-3
2 Lecture 1 : 590.03 Fall 12
– Theory/algorithms for privacy – Implement/adapt existing work to new domains – Participate in WSDM Data Challenge: De-anonymization
– Literature review – Some original research/implementation
– ≤Sep 28: Choose Project (ideas will be posted … new ideas welcome) – Oct 12: Project proposal (1-4 pages describing the project) – Nov 16: Mid-project review (2-3 page report on progress) – Dec 5&7: Final presentations and submission (6-10 page conference style paper + 10-15 minute talk)
Lecture 1 : 590.03 Fall 12 3
1. Privacy is (one of) the most important grand challenges in managing today’s data!
1. “What Next? A Half-Dozen Data Management Research Goals for Big Data and Cloud”, Surajit Chaudhuri, Microsoft Research 2. “Big data: The next frontier for innovation, competition, and productivity”, McKinsey Global Institute Report, 2011
Lecture 1 : 590.03 Fall 12 4
1. Privacy is (one of) the most important grand challenges in managing today’s data! 2. Very active field and tons of interesting research. We will read papers in:
– Data Management (SIGMOD, VLDB, ICDE) – Theory (STOC, FOCS) – Cryptography/Security (TCC, SSP, NDSS) – Machine Learning (KDD, NIPS) – Statistics (JASA)
Lecture 1 : 590.03 Fall 12 5
1. Privacy is (one of) the most important grand challenges in managing today’s data! 2. Very active field and tons of interesting research. 3. Intro to research by working on a cool project
– Read scientific papers about an exciting data application – Formulate a problem – Perform a scientific evaluation
Lecture 1 : 590.03 Fall 12 6
Lecture 1 : 590.03 Fall 12 7
Lecture 1 : 590.03 Fall 12 8
Estimated User Data Generated per day [Ramakrishnan 2007]
9
Lecture 1 : 590.03 Fall 12
10
Lecture 1 : 590.03 Fall 12
11
Lecture 1 : 590.03 Fall 12
12
Lecture 1 : 590.03 Fall 12
13
+250% clicks
+79% clicks
+43% clicks
Recommended links Personalized News Interests Top Searches
Lecture 1 : 590.03 Fall 12
14
creatively and effectively to drive efficiency and quality, the sector could create more than $300 billion in value every year. ”
McKinsey Global Institute Report
Lecture 1 : 590.03 Fall 12
DB
Person 1
r1
Person 2
r2
Person 3
r3
Person N
rN
Census
DB
Hospital
DB
Doctors Medical Researchers Economists Information Retrieval Researchers Recommen- dation Algorithms
15 Lecture 1 : 590.03 Fall 12
16
Lecture 1 : 590.03 Fall 12
17
Lecture 1 : 590.03 Fall 12
Medical Data
date
18 Lecture 1 : 590.03 Fall 12
Registered
affiliation
voted
date
Medical Data Voter List
19 Lecture 1 : 590.03 Fall 12
Registered
affiliation
voted
date
Medical Data Voter List
uniquely identified using ZipCode, Birth Date, and Sex. Name linked to Diagnosis
20 Lecture 1 : 590.03 Fall 12
Registered
affiliation
voted
date
Medical Data Voter List
uniquely identified using ZipCode, Birth Date, and Sex.
Quasi Identifier
87 % of US population
21 Lecture 1 : 590.03 Fall 12
22
“… Last week AOL did another stupid thing … … but, at least it was in the name of science…” Alternet, August 2006
Lecture 1 : 590.03 Fall 12
AOL “anonymously” released a list of 21 million web search queries.
23
Ashwin222 Ashwin222 Ashwin222 Ashwin222 Pankaj156 Pankaj156 Cox12345 Cox12345 Cox12345 Cox12345 Ashwin222 Ashwin222 Uefa cup Uefa champions league Champions league final Champions league final 2007 exchangeability Proof of deFinitti’s theorem Zombie games Warcraft Beatles anthology Ubuntu breeze Grammy 2008 nominees Amy Winehouse rehab
Lecture 1 : 590.03 Fall 12
AOL “anonymously” released a list of 21 million web search queries. UserIDs were replaced by random numbers …
24
Uefa cup Uefa champions league Champions league final Champions league final 2007 exchangeability Proof of deFinitti’s theorem Zombie games Warcraft Beatles anthology Ubuntu breeze Grammy 2008 nominees Amy Winehouse rehab 865712345 865712345 865712345 865712345 236712909 236712909 112765410 112765410 112765410 112765410 865712345 865712345
Lecture 1 : 590.03 Fall 12
25
[NYTimes 2006]
Lecture 1 : 590.03 Fall 12
26
Lecture 1 : 590.03 Fall 12
A data sharing mechanism M that allows an unauthorized party to learn sensitive information about any individual, which could not have learnt without access to M.
27
Lecture 1 : 590.03 Fall 12
Statistical Privacy (Trusted Collector) Problem
28
Individual 1 r1 Individual 2 r2 Individual 3 r3 Individual N rN
Server
DB
Utility: Privacy: No breach about any individual
Lecture 1 : 590.03 Fall 12
Statistical Privacy (Untrusted Collector) Problem
29
Individual 1 r1 Individual 2 r2 Individual 3 r3 Individual N rN
Server
DB
Lecture 1 : 590.03 Fall 12
Lecture 1 : 590.03 Fall 12 30
Application Data Collector Third Party (adversary) Private Information Function (utility) Medical Hospital Epidemiologist Disease Correlation between disease and geography Genome analysis Hospital Statistician/ Researcher Genome Correlation between genome and disease Advertising Google/FB/Y! Advertiser Clicks/Brows ing Number of clicks on an ad by age/region/gender … Social Recommen- dations Facebook Another user Friend links / profile Recommend other users
social network
Lecture 1 : 590.03 Fall 12 31
Application Data Collector Private Information Function (utility) Location Services Verizon/AT&T Location Local Search Recommen- dations Amazon/Google Purchase history Product Recommendations Traffic Shaping Internet Service Provider Browsing history Traffic pattern of groups of users
What is a right definition of privacy? How to develop mechanisms that trade-off privacy for utility?
32
Lecture 1 : 590.03 Fall 12
extent information about us is communicated to others …” Westin, 1967
individual is released. Parent, 1983
Lecture 1 : 590.03 Fall 12 33
many other individual’s records.
– Social network anonymization – Location privacy – Anonymous routing
Lecture 1 : 590.03 Fall 12 34
patients
– We can infer Bob has Cancer !
– Privacy is breached if releasing D (or f(D)) allows an adversary to learn sufficient new information. – New Information = distance(adversary’s prior belief, adversary’s posterior belief after seeing D) – New Information can’t be 0 if the output D or f(D) should be useful.
Lecture 1 : 590.03 Fall 12 35
– L-diversity, T-closeness, M-invariance, ε- Differential privacy, E- Privacy, …
– What information is considered sensitive
– What is the adversary’s prior
individuals
– How is new information measured
Lecture 1 : 590.03 Fall 12 36
– For every adversarial prior and every property about an individual, new information is bounded by some constant.
with even a sliver of utility, there is some adversary with a prior such that privacy is not guaranteed.
Lecture 1 : 590.03 Fall 12 37
– Generalization or coarsening of attributes – Suppression of outliers – Perturbation – Adding noise – Sampling
Lecture 1 : 590.03 Fall 12 38
And, information disclosure may not add up linearly.
– If A1 releases the fact that Bob’s salary is <= 50,000, while A2 releases the fact that Bob’s salary is >= 50,000; then we know Bob’s salary is exactly 50,000. – Composition of Privacy
– If algorithm perturbs x by adding 1, then x can be reconstructed. – Simulatability of Algorithms
Lecture 1 : 590.03 Fall 12 39
– Medical/Census Data, Search Logs, Social Networks, Location GPS traces
– Number of students enrolled in this class categorized by gender, nationality – Data Cubes (database), Marginals (statistics)
– Measures of centrality (what is the degree distribution? How many triangles?)
– Continuously monitor number of cars crossing a toll booth. – Location Privacy, Health …
Lecture 1 : 590.03 Fall 12 40
– Can I participate in an auction without the output of the auction revealing my private utility function? – Modern advertising is based on auction design. – Auctions and Mechanism Design
– Regress disease and gender/location/age – Inside tip: Big open area. Much theory – doesn’t work in practice
– Think netflix, amazon …
Lecture 1 : 590.03 Fall 12 41
http://www.cs.duke.edu/courses/fall12/compsci590.3/
Theory/Algorithms (Lectures 1-18) Applications (Lectures 19-25) Project Presentations (Lectures 26, 27)
Lecture 1 : 590.03 Fall 12 42
Skip to end >>>
Lecture 1 : 590.03 Fall 12 45
Lecture 1 : 590.03 Fall 12 46
– heads with probability p, and – tails with probability 1-p (p > ½)
Lecture 1 : 590.03 Fall 12 47
True Answer = Yes True Answer = No Heads Yes No Tails No Yes
= 0, if the ith respondent says “no” P(Yi = 1) = (True answer = yes AND coin = heads) OR (True answer = no AND coin = tails) = πp + (1-π)(1-p) = pyes P(Yi = 0) = π(1-p) + (1-π)p = pno
Lecture 1 : 590.03 Fall 12 48
Yes No Heads Yes No Tails No Yes
n1 pno (n-n1)
πhat = {n1/n – (1-p)}/(2p-1)
Lecture 1 : 590.03 Fall 12 49
P(Bob’s true answer is “yes” | Bob says “yes”) = P(Bob says “yes” AND Bob’s true answer is “yes”) / P(Bob says yes) = P(Bob says “yes” | Bob’s true answer is “yes”)P(Bob’s true answer is “yes”)
P(Bob says “yes” | Bob’s true answer is “yes”)P(Bob’s true answer is “yes”) + P(Bob says “yes” | Bob’s true answer is “no”)P(Bob’s true answer is “no”) = pθ / pθ + (1-p)(1-θ) ≤ p/(1-p) θ
Lecture 1 : 590.03 Fall 12 50
P(Bob’s true answer is“yes”) = θ
Adversary’s posterior belief: P(Bob’s true answer is “yes” | Bob says “yes”) ≤ p/(1-p) θ
Adversary’s posterior belief is always bounded by p/1-p times the adversary’s prior belief (irrespective of what the prior is)
Lecture 1 : 590.03 Fall 12 51
– p/1-p = infinity : no privacy – πhat = n1/n = true answer
– p/1-p = 1: perfect privacy – We cannot estimate πhat since the answers are independent of the input. – Pyes = πp + (1-π)(1-p) = ½(π + 1 – π) = ½ = Pno
Lecture 1 : 590.03 Fall 12 52
– Netflix recommendations – Social networks
Lecture 1 : 590.03 Fall 12 53