An Automated Social Graph De-anonymization Technique
Kumar Sharad 1 George Danezis 2
1 2
November 3, 2014
Workshop on Privacy in the Electronic Society, Scottsdale, Arizona, USA
An Automated Social Graph De-anonymization Technique Kumar Sharad 1 - - PowerPoint PPT Presentation
An Automated Social Graph De-anonymization Technique Kumar Sharad 1 George Danezis 2 1 2 November 3, 2014 Workshop on Privacy in the Electronic Society, Scottsdale, Arizona, USA This Talk 1 The Art of Data Anonymization 2 The D4D Challenge 3
1 2
Workshop on Privacy in the Electronic Society, Scottsdale, Arizona, USA
2
1 The Art of Data Anonymization 2 The D4D Challenge 3 An Ad-hoc Attack 4 Learning De-anonymization 5 Results
3
4
5
6
Population – 22.4 million. Mobile phone users – 17.3 million. Telco subscribers – 5 million. A country fraught with civil war.
1http://www.d4d.orange.com/
7
8
8300 egonets. Edge attributes: call volume, duration and directionality.
5000 egonets. All edges between 2-hop nodes are removed. Edge attributes: redacted.
9
ego 1-hop 1-hop 1-hop 2-hop 2-hop 2-hop 2-hop
Scheme 1: Pre-review
9
ego 1-hop 1-hop 1-hop 2-hop 2-hop 2-hop 2-hop
Scheme 1: Pre-review
ego 1-hop 1-hop 1-hop 2-hop 2-hop 2-hop 2-hop
Scheme 2: Post-review
9
ego 1-hop 1-hop 1-hop 2-hop 2-hop 2-hop 2-hop
Scheme 1: Pre-review
ego 1-hop 1-hop 1-hop 2-hop 2-hop 2-hop 2-hop
Scheme 2: Post-review
10
11
Hard to merge the egonets. Difficulty of linking egonets should be quantifiable.
11
Hard to merge the egonets. Difficulty of linking egonets should be quantifiable.
Show that a significant fraction of egonets can be re-linked. Discern real world identities. Recover full communication graph.
11
Hard to merge the egonets. Difficulty of linking egonets should be quantifiable.
Show that a significant fraction of egonets can be re-linked. Discern real world identities. Recover full communication graph.
12
13
14
ego
1-hop 1-hop 1-hop 2-hop 2-hop 2-hop 2-hop
14
ego
1-hop 1-hop 1-hop 2-hop 2-hop 2-hop 2-hop
14
ego
1-hop 1-hop 1-hop 2-hop 2-hop 2-hop 2-hop
deg: 1 deg: 3 deg: 2 deg: 2
14
ego
1-hop 1-hop 1-hop 2-hop 2-hop 2-hop 2-hop
deg: 1 deg: 3 deg: 2 deg: 2 sig: [1, 2, 2, 3]
14
ego
1-hop 1-hop 1-hop 2-hop 2-hop 2-hop 2-hop
deg: 1 sig: [1, 2, 2, 3]
deg: 1 deg: 4 deg: 1 deg: 1 sig: [1, 1, 1, 4]
15
16
17
18
1 An anonymization strategy is designed. 2 Manually construct an attack. 3 Strategy is tweaked. 4 GO TO 2.
1 An anonymization strategy is designed. 2 Generate training and test data based on the algorithm. 3 Extract features. 4 Train the model. 5 Evaluate the performance
19
19
19
Anonymized Egonets
19
Anonymized Egonets
Training Set Known node pairs Evaluation Set
19
Anonymized Egonets
Training Set Known node pairs Evaluation Set Identical node pair?
20
21
21
c0 = 8 c1 = 4 c2 = 0
size = 15
70 bins . . . . . . c4 = 3 . . . . . . c69 = 2
22
c0 = 8 c1 = 4 c2 = 0
size = 15
70 bins . . . . . . c4 = 3 . . . . . . c69 = 2
23
c0 = 8 c1 = 4 c2 = 0
size = 15
70 bins . . . . . . c4 = 3 . . . . . . c69 = 2
24
c0 = 8 c1 = 4 c2 = 0
size = 15
70 bins . . . . . . c4 = 3 . . . . . . c69 = 2
25
c0 = 8 c1 = 4 c2 = 0
size = 15
70 bins . . . . . . c4 = 3 . . . . . . c69 = 2
26
c0 = 8 c1 = 4 c2 = 0
size = 15
70 bins . . . . . . c4 = 3 . . . . . . c69 = 2
27
28
29
Ethical concerns Lack of ground truth.
D4D (5M nodes) – 5000 egonets released. Epinions (75K nodes) – 100 egonets extracted. Pokec (1.6M nodes) – 1000 egonets extracted.
30
0.0 0.2 0.4 0.6 0.8 1.0
False Positive
0.0 0.2 0.4 0.6 0.8 1.0
True Positive
1-hop: AUC = 0.952 1,2-hop: AUC = 0.914 2-hop: AUC = 0.802 Complete: AUC = 0.793
Pokec: Scheme 1 (self-validation)
0.0 0.2 0.4 0.6 0.8 1.0
False Positive
0.0 0.2 0.4 0.6 0.8 1.0
True Positive
1-hop: AUC = 0.978 1,2-hop: AUC = 0.930 2-hop: AUC = 0.984 Complete: AUC = 0.891
Pokec: Scheme 2 (self-validation)
31
Scheme 1
Scheme 2
32
Scheme 1
Scheme 2
33
34
Scheme 1
Scheme 2
35
Scheme 1
Scheme 2
36
37
38
✒
ksharad.com
✒
cs.ucl.ac.uk/staff/G.Danezis