SYBILFUSE: Combining Local Attributes with Global Structure to Perform Robust Sybil Detection
Peng Gao 1 Binghui Wang 2 Neil Zhenqiang Gong 2 Sanjeev R. Kulkarni 1 Kurt Thomas 3 Prateek Mittal 1
1Princeton University 2Iowa State University 3Google
S YBIL F USE : Combining Local Attributes with Global Structure to - - PowerPoint PPT Presentation
S YBIL F USE : Combining Local Attributes with Global Structure to Perform Robust Sybil Detection Peng Gao 1 Binghui Wang 2 Neil Zhenqiang Gong 2 Sanjeev R. Kulkarni 1 Kurt Thomas 3 Prateek Mittal 1 1 Princeton University 2 Iowa State University 3
Peng Gao 1 Binghui Wang 2 Neil Zhenqiang Gong 2 Sanjeev R. Kulkarni 1 Kurt Thomas 3 Prateek Mittal 1
1Princeton University 2Iowa State University 3Google
1
Introduction to Sybil Attack
2
Background and Related Work
3
The SYBILFUSE Framework
4
Evaluation on Labeled Twitter Networks
5
Conclusion
Peng Gao SYBILFUSE 2 / 45
1
Introduction to Sybil Attack
2
Background and Related Work
3
The SYBILFUSE Framework
4
Evaluation on Labeled Twitter Networks
5
Conclusion
Peng Gao SYBILFUSE 3 / 45
Sybil Attack: A single adversary injects multiple colluding identities in the system to compromise security and privacy.
Peng Gao SYBILFUSE 4 / 45
Sybil Attack: A single adversary injects multiple colluding identities in the system to compromise security and privacy.
Peng Gao SYBILFUSE 5 / 45
Sybil Attack: A single adversary injects multiple colluding identities in the system to compromise security and privacy.
Peng Gao SYBILFUSE 6 / 45
Sybil Attack
Fake news Fake reviews Malware Spam messages Scams Unsolicited friend requests Others Private data
Peng Gao SYBILFUSE 7 / 45
Peng Gao SYBILFUSE 8 / 45
1
Introduction to Sybil Attack
2
Background and Related Work
3
The SYBILFUSE Framework
4
Evaluation on Labeled Twitter Networks
5
Conclusion
Peng Gao SYBILFUSE 9 / 45
Peng Gao SYBILFUSE 10 / 45
Limitations:
their profiles and connections.
Peng Gao SYBILFUSE 11 / 45
Integro [Boshmaf et al. NDSS’15]
Peng Gao SYBILFUSE 12 / 45
Integro [Boshmaf et al. NDSS’15]
Limitations:
Peng Gao SYBILFUSE 13 / 45
Integro [Boshmaf et al. NDSS’15]
Limitations:
Peng Gao SYBILFUSE 14 / 45
Integro [Boshmaf et al. NDSS’15]
Limitations:
Integro requires the number of victims to be small and the victims are accurately predicted.
Peng Gao SYBILFUSE 15 / 45
1
Introduction to Sybil Attack
2
Background and Related Work
3
The SYBILFUSE Framework
4
Evaluation on Labeled Twitter Networks
5
Conclusion
Peng Gao SYBILFUSE 16 / 45
SybilFuse Framework
Structural Attributes Local Attributes Content Attributes Known Labels Social Network Data Input Output Predicted Labels Node Ranking Directed/Undirected Graph Global Structure Weighted Random Walk Weighted Loopy Belief Propagation Trust Score Propagation
local trust scores
Local Classifiers
final scores
Peng Gao SYBILFUSE 17 / 45
Sv for node v: probability that v is benign
(e.g., degree, local clustering coefficient, profile info)
Peng Gao SYBILFUSE 18 / 45
Sv for node v: probability that v is benign
(e.g., degree, local clustering coefficient, profile info)
Su,v for edge (u, v): probability that u and v take the same label (i.e., models homophily strength)
Peng Gao SYBILFUSE 19 / 45
Set the initial score of every node v: S(0)(v) = 0.9 v is a training benign node 0.1 v is a training Sybil node Sv else
Peng Gao SYBILFUSE 20 / 45
Set the initial score of every node v: S(0)(v) = 0.9 v is a training benign node 0.1 v is a training Sybil node Sv else Score update equation: S(i)(v) =
S(i−1)(u) Su,v
Peng Gao SYBILFUSE 21 / 45
Set the initial score of every node v: S(0)(v) = 0.9 v is a training benign node 0.1 v is a training Sybil node Sv else Score update equation: S(i)(v) =
S(i−1)(u) Su,v
After d = O(log n) iterations, we obtain the final score SF
v :
SF
v = S(d)(v)
Peng Gao SYBILFUSE 22 / 45
Node & edge potentials: Xv ∈ {1, −1} represents the label of node v ψv(Xv) =
if Xv = 1 1 − Sv if Xv = −1 ψu,v(Xu, Xv) =
if XuXv = 1 1 − Su,v if XuXv = −1 (G, Ψ) defines a pairwise Markov Random Field.
Peng Gao SYBILFUSE 23 / 45
Belief update equation: mu→v(Xv) =
ψu(Xu)ψu,v(Xu, Xv)
ms→u(Xs)
Peng Gao SYBILFUSE 24 / 45
Belief update equation: mu→v(Xv) =
ψu(Xu)ψu,v(Xu, Xv)
ms→u(Xs) After d = 5 ∼ 10 iterations, we obtain the final score SF
v :
belv(Xv = xv) ∝ ψv(Xv = xv)
mu→v(Xv = xv) SF
v =
belv(Xv = 1) belv(Xv = 1) + belv(Xv = −1)
Peng Gao SYBILFUSE 25 / 45
Label Lv of node v is predicted as: Lv = sign(SF
v − threshold)
We can also rank nodes according to SF
v . Sybil nodes with low scores
will be ranked upfront.
Peng Gao SYBILFUSE 26 / 45
1
Introduction to Sybil Attack
2
Background and Related Work
3
The SYBILFUSE Framework
4
Evaluation on Labeled Twitter Networks
5
Conclusion
Peng Gao SYBILFUSE 27 / 45
edges (40,001 attack edges)
Peng Gao SYBILFUSE 28 / 45
edges (40,001 attack edges) We have the following observations:
benign nodes.
Peng Gao SYBILFUSE 29 / 45
edges (40,001 attack edges) We have the following observations:
benign nodes.
average per Sybil.
Peng Gao SYBILFUSE 30 / 45
edges (40,001 attack edges) We have the following observations:
benign nodes.
average per Sybil.
Peng Gao SYBILFUSE 31 / 45
edges (40,001 attack edges) We have the following observations:
benign nodes.
average per Sybil.
Thus, the benign region and the Sybil region can hardly be viewed as separate communities.
Peng Gao SYBILFUSE 32 / 45
|In(v)|
|Out(v)|
|Nei(v)|(|Nei(v)|−1)
)
Peng Gao SYBILFUSE 33 / 45
|In(v)|
|Out(v)|
|Nei(v)|(|Nei(v)|−1)
) We randomly sample 50 benign nodes and 50 Sybil nodes as the training set, and train a SVM classifier with RBF kernel using LIBSVM.
Peng Gao SYBILFUSE 34 / 45
100 200 300 400 500 Top K nodes 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fraction of Sybil nodes
RG SVM SR CIA INT INT-PF SF-RW
(a) Random walk-based approaches
100 200 300 400 500 Top K nodes 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fraction of Sybil nodes
EnC-SR EnC-CIA EnC-SB EnC-SS SB SS SF-LBP
(b) LBP-based approaches and ensemble methods
Peng Gao SYBILFUSE 35 / 45
edges)
Peng Gao SYBILFUSE 36 / 45
edges)
We have the following observations:
per Sybil).
Peng Gao SYBILFUSE 37 / 45
We use the same set of features: Reqin(v), Reqout(v), CC(v).
(a) Scatter plot
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Clustering coefficient CDF
Empirical CDF
Benign nodes Sybil nodes
(b) CDF
Peng Gao SYBILFUSE 38 / 45
We use the same set of features: Reqin(v), Reqout(v), CC(v).
(a) Scatter plot
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Clustering coefficient CDF
Empirical CDF
Benign nodes Sybil nodes
(b) CDF
We randomly sample 3000 benign nodes and 3000 Sybil nodes as the training set, and train a SVM classifier with RBF kernel using LIBSVM.
Peng Gao SYBILFUSE 39 / 45
AUC
SR CIA INT INT-PF SB SS SF-RW SF-LBP 0.57 0.80 0.48 0.54 0.74 0.74 0.81 0.85
Peng Gao SYBILFUSE 40 / 45
AUC
SR CIA INT INT-PF SB SS SF-RW SF-LBP 0.57 0.80 0.48 0.54 0.74 0.74 0.81 0.85
Sybil ranking
1K 10K 50K 100K 1M 10M Top K nodes 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fraction of Sybil nodes
SR CIA INT INT-PF SB SS SF-RW SF-LBP Peng Gao SYBILFUSE 41 / 45
1
Introduction to Sybil Attack
2
Background and Related Work
3
The SYBILFUSE Framework
4
Evaluation on Labeled Twitter Networks
5
Conclusion
Peng Gao SYBILFUSE 42 / 45
local attributes with global structure.
Peng Gao SYBILFUSE 43 / 45
local attributes with global structure.
networks, and demonstrated that SYBILFUSE outperforms existing approaches.
Peng Gao SYBILFUSE 44 / 45