CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS - - PowerPoint PPT Presentation

▶

Jan 10, 2024 341 likes •805 views

CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS Meng Jiang, Tsinghua University, Beijing, China Joint work with Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang August 26, 2014 NYC, USA 2 Fraud Detection:

SLIDE 1

CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS

Meng Jiang, Tsinghua University, Beijing, China Joint work with Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang August 26, 2014 – NYC, USA

SLIDE 2

Fraud Detection: Graph Analysis Problem

2 [www.buyfollowz.org] [buymorelikes.com]

SLIDE 3

Fraud Detection: Graph Analysis Problem

3 [buycheaplikes.com] [reviewsteria.com]

SLIDE 4

Our Goals

Given: A graph (large-scale, directed, etc.)
Find: Frauds = Anomalous edges
Goals:
G1. Find patterns that distinguish fraudsters

from normal users

G2. Design algorithms that catch fraudsters

SLIDE 5

1. Background

OUTLINE

2. Fraudulent Pattern
3. The Algorithm
4. Experiments

SLIDE 6

Anomalies in Degree Distributions

Power-law distribution

DBLP Author-publication Flickr User-user Twitter Who-follows-whom

[konect.uni-koblenz.de/networks/]

SLIDE 7

Anomalies in Degree Distributions

0.41M 3.17M d=20

2009

41M

SLIDE 8

Linear Classifier with “Degree”: Fail

Label (+1,-1) Out-degree

classifier =20? +1 (Fraud)

×

0.41M 3.17M d=20

SLIDE 9

Graph Structure Distorted

2011

117M 0.44M 1.91M d=64

SLIDE 10

Traditional Fraud Detection

Label (+1,-1) Out-degree In-degree #tweet #url in tweets #hashtag in tweets

classifier

Content-based features

Big? Small? Big? Big? Big? +1 (Fraud)

SLIDE 11

Empty Profile?

SLIDE 12

Few Followers?

SLIDE 13

Many Followings?

SLIDE 14

Content: Unavailable? Look Normal?

Label (+1,-1) Out-degree In-degree #tweet #url in tweets #hashtag in tweets

classifier

Content-based features

0, 0, 0… sorry

SLIDE 15

Behavior is the Key

Monetary Incentive Content what they appear to behave Behavior/ Links what they have to behave

SLIDE 16

OUTLINE

2. Fraudulent Pattern
3. The Algorithm
4. Experiments

1. Background

SLIDE 17

Behavior-based Features

Out-degree 1st left singular vector (Hubness) 2nd left singular vector … In-degree 1st right singular vector (Authoritativeness) 2nd right singular vector …

Follower behavior Followee behavior ≈ ≈

SLIDE 18

Behavior-based Feature Space

Follower behavior Followee behavior

SLIDE 19

Fraudulent Behavior Patterns

SLIDE 20

Fraudulent Behavior Patterns

SLIDE 21

Fraudulent Behavior Patterns

SLIDE 22

Fraudulent Behavior Patterns

SLIDE 23

Fraudulent Behavior Patterns

SLIDE 24

Fraudulent Behavior Patterns

Synchronized
Abnormal

SLIDE 25

OUTLINE

3. The Algorithm
4. Experiments

1. Background
2. Fraudulent Pattern

SLIDE 26

Synchronicity and Normality

Synchronicity

SLIDE 27

Synchronicity and Normality

Normality

SLIDE 28

Synchronicity-Normality Plot

SLIDE 29

Theorem

For any distribution, there is a parabolic lower

limit in the synchronicity-normality plot.

Proof. See our paper 

synchronicity normality

SLIDE 30

CatchSync Algorithm

Distance-based

anomaly detection

Fraudsters
Big synchronicity
Small normality
Away from the densest

SLIDE 31

OUTLINE

4. Experiments

1. Background
2. Fraudulent Pattern Mining
3. The Algorithm

SLIDE 32

Experiments

Q1: Does CatchSync remove anomalies?
Degree distribution
Feature space
Q2: Is CatchSync catching actually

fraudulent users?

Q3: Is CatchSync robust?

SLIDE 33

Q1: Does CatchSync Remove Anomalies?

0.41M 3.17M d=20

2009

41M

SLIDE 34

Q1: Does CatchSync Remove Anomalies?

2011

117M

SLIDE 35

Before CatchSync

Follower behavior Followee behavior

SLIDE 36

After CatchSync

Follower behavior Followee behavior

SLIDE 37

Q2: Is CatchSync Catching Actually Fraudulent Users?

173/1,000 237/1,000

SLIDE 38

Q2: Is CatchSync Catching Actually Fraudulent Users?

0.412 0.597 0.751 0.813

0.2 0.4 0.6 0.8 1

CatchSync +SPOT CatchSync SPOT OutRank

SLIDE 39

Q2: Is CatchSync Catching Actually Fraudulent Users?

0.377 0.653 0.694 0.785

0.2 0.4 0.6 0.8 1

CatchSync +SPOT CatchSync SPOT OutRank

SLIDE 40

Q2: Is CatchSync Catching Actually Fraudulent Users?

Recall = 80% Precision in Twitter Precision in Tencent Weibo 83.5% 79.4%

SLIDE 41

Q3: Is CatchSync Robust to Camouflage?

Target Popular camouflage Random camouflage

SLIDE 42

Q3: Is CatchSync Robust to Camouflage?

SLIDE 43

Q3: Is CatchSync Robust to Camouflage?

SLIDE 44

Q3: Is CatchSync Robust to Camouflage?

Random camouflage Popular camouflage

SLIDE 45

Conclusion

Goals
G1. Find patterns that distinguish fraudulent

user behavior from normal behavior

A1: Synchronized & Abnormal!
G2. Design algorithms that catch fraudsters
A2: CatchSync!
Remove spikes
Content free
Robust to camouflage

SLIDE 46