Privacy Definitions: Beyond Anonymity
CompSci 590.03 Instructor: Ashwin Machanavajjhala
1 Lecture 5 : 590.03 Fall 12
Privacy Definitions: Beyond Anonymity CompSci 590.03 Instructor: - - PowerPoint PPT Presentation
Privacy Definitions: Beyond Anonymity CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 5 : 590.03 Fall 12 1 Announcements Some new project ideas added Please meet with me at least once before you finalize your project
1 Lecture 5 : 590.03 Fall 12
Lecture 5 : 590.03 Fall 12 2
Lecture 5 : 590.03 Fall 12 3
Hospital
DB
Publish properties of {r1, r2, …, rN} Patient 1
r1
Patient 2
r2
Patient 3
r3
Patient N
rN
4
Zip Age Nationality
Disease 13053 28 Russian Heart 13068 29 American Heart 13068 21 Japanese Flu 13053 23 American Flu 14853 50 Indian Cancer 14853 55 Russian Heart 14850 47 American Flu 14850 59 American Flu 13053 31 American Cancer 13053 37 Indian Cancer 13068 36 Japanese Cancer 13068 32 American Cancer
5
can identify individuals in the population table T* is k-anonymous if each
is ≥ k Parameter k indicates “degree” of anonymity
Zip Age Nationality
Disease 130** <30 * Heart 130** <30 * Heart 130** <30 * Flu 130** <30 * Flu 1485* >40 * Cancer 1485* >40 * Heart 1485* >40 * Flu 1485* >40 * Flu 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer 6
– k-Anonymity is NP-hard – (log k) Approximation Algorithm exists
– Incognito (use monotonicity to prune generalization lattice) – Mondrian (multidimensional partitioning) – Hilbert (convert multidimensional problem into a 1d problem) – …
7
8
Zip Age Nat.
Disease 130** <30 * Heart 130** <30 * Heart 130** <30 * Flu 130** <30 * Flu 1485* >40 * Cancer 1485* >40 * Heart 1485* >40 * Flu 1485* >40 * Flu 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer
Name Zip Age
Nat. Bob 13053 35 ?? 9
Zip Age Nat.
Disease 130** <30 * Heart 130** <30 * Heart 130** <30 * Flu 130** <30 * Flu 1485* >40 * Cancer 1485* >40 * Heart 1485* >40 * Flu 1485* >40 * Flu 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer
Name Zip Age
Nat. Umeko 13068 24 Japan 10
Zip Age Nat.
Disease 130** <30 * Heart 130** <30 * Heart 130** <30 * Flu 130** <30 * Flu 1485* >40 * Cancer 1485* >40 * Heart 1485* >40 * Flu 1485* >40 * Flu 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer
Name Zip Age
Nat. Umeko 13068 24 Japan 11
12
13
14
Zip Age Nat.
Disease 130** <30 * Heart 130** <30 * Heart 130** <30 * Flu 130** <30 * Flu 1485* >40 * Cancer 1485* >40 * Heart 1485* >40 * Flu 1485* >40 * Flu 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer
Name Zip Age
Nat. Bob 13053 35 ??
Name Zip Age
Nat. Umeko 13068 24 Japan 15
Zip Age Nat.
Disease 1306* <=40 * Heart 1306* <=40 * Flu 1306* <=40 * Cancer 1306* <=40 * Cancer 1485* >40 * Cancer 1485* >40 * Heart 1485* >40 * Flu 1485* >40 * Flu 1305* <=40 * Heart 1305* <=40 * Flu 1305* <=40 * Cancer 1305* <=40 * Cancer
Name Zip Age
Nat. Bob 13053 35 ?? 16
Name Zip Age
Nat. Umeko 13068 24 Japan
– What does the parameter L signify?
17
[Machanavajjhala et al ICDE 2006]
18
– Data Publisher may not know exact adversarial knowledge
19
20
Sasha Tom Umeko Van Amar Boris Carol Dave Bob Charan Daiki Ellen
21
Cancer Cancer Cancer Cancer Cancer Cancer Cancer Cancer Cancer Cancer Cancer Cancer Heart Heart Flu Flu Cancer Heart Flu Flu Cancer Cancer Cancer Cancer Heart Flu Flu Heart Heart Cancer Flu Flu Cancer Cancer Cancer Cancer Flu Heart Heart Flu Cancer Flu Heart Flu Cancer Cancer Cancer Cancer Heart Flu Heart Flu Flu Heart Flu Cancer Cancer Cancer Cancer Cancer
World 1 World 2 World 3 World 4 World 5
Sasha Tom Umeko Van Amar Boris Carol Dave Bob Charan Daiki Ellen
22
Cancer Cancer Cancer Cancer Cancer Cancer Cancer Cancer Heart Flu Cancer Cancer Heart Heart Flu Flu Cancer Heart Flu Flu Cancer Cancer Cancer Cancer Heart Flu Flu Heart Heart Cancer Flu Flu Cancer Cancer Cancer Cancer Flu Heart Heart Flu Cancer Flu Heart Flu Cancer Cancer Cancer Cancer Heart Flu Heart Flu Flu Heart Flu Cancer Cancer Cancer Cancer Cancer Cancer 0 Heart 2 Flu 2 Cancer 1 Heart 1 Flu 2 Cancer 4 Heart 0 Flu 0
World 1 World 2 World 3 World 4 World 5
Sasha Tom Umeko Van Amar Boris Carol Dave Bob Charan Daiki Ellen
23
Heart Heart Flu Flu Cancer Heart Flu Flu Cancer Cancer Cancer Cancer Heart Flu Flu Heart Heart Cancer Flu Flu Cancer Cancer Cancer Cancer Flu Heart Heart Flu Cancer Flu Heart Flu Cancer Cancer Cancer Cancer Heart Flu Heart Flu Flu Heart Flu Cancer Cancer Cancer Cancer Cancer
Cancer 0 Heart 2 Flu 2 Cancer 1 Heart 1 Flu 2 Cancer 4 Heart 0 Flu 0
World 2 World 3 World 4 World 5
Sasha Tom Umeko Van Amar Boris Carol Dave Bob Charan Daiki Ellen
Cancer 0 Heart 2 Flu 2 Cancer 1 Heart 1 Flu 2 Cancer 4 Heart 0 Flu 0
24
Sasha Tom Umeko Van Amar Boris Carol Dave Bob Charan Daiki Ellen
Cancer 0 Heart 2 Flu 2 Cancer 1 Heart 1 Flu 2 Cancer 4 Heart 0 Flu 0
25
26
– Knows ≤ (L-2) negation statements of the form “Umeko does not have a Heart disease.”
– Consider all possible conjunctions of ≤ (L-2) statements
27
Cancer 10 Heart 5 Hepatitis 2 Jaundice 1
28
Cancer 1000 Heart 5 Hepatitis 2 Jaundice 1 Malaria 1
29
Cancer 1000 Heart 5 Hepatitis 2 Jaundice 1 Malaria 1
30
31
– In every group g, – Check the L-Diversity condition.
32
<N0, Z0> <N1, Z0> <N0, Z1> <N1, Z1> <N0, Z2> <N1, Z2>
Nationality Zip
* 1306* * 1305* * 1485*
Nationality Zip
American 130** Japanese 130** Japanese 148**
33
Nationality Zip
American 1306* Japanese 1305* Japanese 1485*
Suppress strictly more information
– Incognito – Mondrian – Hilbert
34
Lecture 5 : 590.03 Fall 12 35
[Xiao, Tao SIGMOD 2007]
36
propositional formula over all tuples in the table.
implications is #P-hard, ensuring privacy against worst case k implications is tractable.
[M et al ICDE 06] [Martin et al ICDE 07]
37
– If Alice has the flu, then her husband Bob very likely also has the flu.
Lecture 5 : 590.03 Fall 12 38
Lecture 5 : 590.03 Fall 12 39
Lecture 5 : 590.03 Fall 12 40
Lecture 5 : 590.03 Fall 12 41
attribute in the table is public information.
sensitive attribute in a QID block is “t-close” to the distribution of sensitive attribute in the whole table.
[M et al ICDE 06] [Martin et al ICDE 07]
42
[Li et al ICDE 07]
– 52 years old – Earns 11K – Lives in 47909
– Pr[Bob has Flu] = 1/9
Lecture 5 : 590.03 Fall 12 43
– 52 years old – Earns 11K – Lives in 47909
– Pr[Bob has Flu] = 1/3
Lecture 5 : 590.03 Fall 12 44
Lecture 5 : 590.03 Fall 12 45
Lecture 5 : 590.03 Fall 12 46
v1 v2 v3 v4 v5 v1 v2 v3 v4 v5
Lecture 5 : 590.03 Fall 12 47
Distance = Cost of moving mass from v2 to v1 (f21) v1 v2 v3 v4 v5 v1 v2 v3 v4 v5
Lecture 5 : 590.03 Fall 12 48
Distance = Cost of moving mass from v2 to v1 (f21) + cost of moving mass from v5 to v1 (f51) If the values are numeric, cost can depend not only on amount of “earth” moved, but also the distance it is moved (d21 and d51). v1 v2 v3 v4 v5 v1 v2 v3 v4 v5
Lecture 5 : 590.03 Fall 12 49
Original probability mass in the two distributions p and q which are being compared
[M et al ICDE 06] [Martin et al ICDE 07]
50
[Li et al ICDE 07]
attributes (e.g., any stomach related diseases).
[Xiao et al SIGMOD 06]
two tables that differ in one entry based on the
satisfies differential privacy.
51
– State of the art: Differential privacy (lecture 8)
Lecture 5 : 590.03 Fall 12 52
beyond k-anonymity”, ICDE 2006
Knowledge”, ICDE 2007
diversity”, ICDE 2007
Lecture 5 : 590.03 Fall 12 53