Towards Plausible Graph Anonymization Yang Zhang, Mathias Humbert, - - PowerPoint PPT Presentation
Towards Plausible Graph Anonymization Yang Zhang, Mathias Humbert, - - PowerPoint PPT Presentation
Towards Plausible Graph Anonymization Yang Zhang, Mathias Humbert, Bartlomiej Surma, Praveen Manoharan, Jilles Vreeken, Michael Backes Graph sharing 2 Graph anonymization 3 Graph anonymization id 3 id 7 id 1 id 6 id 4 id 2 id 8 id 5
Graph sharing
2
Graph anonymization
3
Graph anonymization
4
id 6 id 2 id 1 id 7 id 8 id 5 id 4 id 3
Graph anonymization
5
id 6 id 2 id 1 id 7 id 8 id 5 id 4 id 3
Graph anonymization
6
id 6 id 2 id 1 id 7 id 8 id 5 id 4 id 3
Graph anonymization
7
id 6 id 2 id 1 id 7 id 8 id 5 id 4 id 3
Our work
▪ Find a fundamental flaw in graph anonymization designs
8
Our work
▪ Find a fundamental flaw in graph anonymization designs ▪ Exploit it to recover original graph
9
Our work
▪ Find a fundamental flaw in graph anonymization designs ▪ Exploit it to recover original graph ▪ Use our findings to enhance anonymization designs
10
Our work
▪ Find a fundamental flaw in graph anonymization designs ▪ Exploit it to recover original graph ▪ Use our findings to enhance anonymization designs ▪ Evaluate privacy and usability of enhanced techniques on 3 real life datasets: ▪ Enron, NO, Snap
11
Graph anonymization methods
▪ ’08 Liu et al. - k-anonymity (k-DA) ▪ ’08 Zhou et al. - k-anonymity (k-NA) ▪ ’10 Cheng et al. - k-anonymity (k-iso) ▪ ’11 Sala et al. - differential privacy ▪ ’12 Mittal et al. - random walk privacy ▪ ’14 Xiao et al. - differential privacy
12
k-DA algorithm
13
id 6 id 4 id 1 id 8 id 7 id 5 id 3 id 2
k-DA algorithm
id 6 id 4 id 1 id 8 id 7 id 5 id 3 id 2 # nodes
1 2 3 4 5
node degree
1 2 3 4
k-DA algorithm
15
2-DA id 6 id 4 id 1 id 8 id 7 id 5 id 3 id 2 # nodes
1 2 3 4 5
node degree
1 2 3 4
# nodes
1 2 3 4 5 6
node degree
1 2 3 4
k-DA algorithm
16
2-DA id 6 id 4 id 1 id 8 id 7 id 5 id 3 id 2 id 6 id 4 id 1 id 8 id 7 id 5 id 3 id 2 # nodes
1 2 3 4 5
node degree
1 2 3 4
# nodes
1 2 3 4 5 6
node degree
1 2 3 4
SalaDP algorithm
17
ɛ-DP
dK-2 series perturbed dK-2 series
id 6 id 4 id 1 id 8 id 7 id 5 id 3 id 2 id 6 id 4 id 1 id 8 id 7 id 5 id 3 id 2
Social network graph properties
18
id 6 id 2 id 1 id 7 id 8 id 5 id 4 id 3
id 2 id 1 id 3 id 5 id 4
Social network graph properties
19
id 6 id 7 id 8
id 4
Social network graph properties
20
id 6 id 2 id 1 id 7 id 8 id 5 id 3
Social network graph properties
21
id 6 id 2 id 1 id 7 id 8 id 5 id 4 id 3
Graph recovery attack - overview
22
Graph recovery attack - graph embedding
23
▪ Node embeddings with node2vec ’16 Grover and Leskovec ▪ Mapping users into continuous vector space ▪ User’s vector reflects structural properties
Graph recovery attack - graph embedding
24
▪ Plausibility is cosine similarity between embeddings
−0.2 0.0 0.2 0.4 0.6 0.8 1.0
Edge plausibility
1 2 3 4 5 6 7
Number of edges
×104 Original edges Fake edges
Graph recovery attack - graph embedding
25
▪ Plausibility is cosine similarity between embeddings
−0.2 0.0 0.2 0.4 0.6 0.8 1.0
Edge plausibility
1 2 3 4 5 6 7
Number of edges
×104 Original edges Fake edges
Enron NO SNAP 0.0 0.2 0.4 0.6 0.8 1.0
AUC
Cosine Euclidean Bray-Curtis Embeddedness Jaccard Adamic-Adar
Graph recovery attack - graph embedding
26
▪ Find a cutoff point and remove non-plausible edges
−0.2 0.0 0.2 0.4 0.6 0.8 1.0
Edge plausibility
1 2 3 4 5 6 7
Number of edges
×104 Original edges Fake edges F1 score
Enhancing anonymization
▪ get fake edges with highest plausibility? ▪ the distribution will look unnatural
27
Enhancing anonymization
▪ get fake edges with highest plausibility? ▪ the distribution will look unnatural ▪ draw fake edges from same plausibility distribution?
28
Enhancing anonymization
▪ get fake edges with highest plausibility? ▪ the distribution will look unnatural ▪ draw fake edges from same plausibility distribution?
29
k-DA (k=100) Enhanced k-DA (k=100)
Resilience to graph recovery attack
▪ F1 score for original anonymizations ▪ F1 score for enhanced anonymizations
30
k-DA drops by: 26~51% SalaDP drops by: 37~48%
Utility of Enhanced anonymization
31
0.6 0.7 0.8 0.9 1.0
Utility of GA
0.6 0.7 0.8 0.9 1.0
Utility of GF
Eigencentrality (Enron) Eigencentrality (NO) Eigencentrality (SNAP) Degree distribution (Enron) Degree distribution (NO) Degree distribution (SNAP) Triangle count (Enron) Triangle count (NO) Triangle count (SNAP)
Resilience to deanonymization attack
32
Enron NO SNAP 5 10 15 20 25 30
Anonymity gain (%)
k-DA (k =50) k-DA (k =75) k-DA (k =100) SalaDP (✏ =100) SalaDP (✏ =50) SalaDP (✏ =10)
Conclusion
33
We find flaws in current graph anonymizations
Conclusion
34
We find flaws in current graph anonymizations We recover the original, pre-anonymized graph
Conclusion
35
We find flaws in current graph anonymizations We enhance the anonymization techniques We recover the original, pre-anonymized graph
Conclusion
36
We find flaws in current graph anonymizations We enhance the anonymization techniques We evaluate privacy and utility
- f enhanced anonymization
We recover the original, pre-anonymized graph