CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS - - PowerPoint PPT Presentation

catchsync catching synchronized behavior in large
SMART_READER_LITE
LIVE PREVIEW

CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS - - PowerPoint PPT Presentation

CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS Meng Jiang, Tsinghua University, Beijing, China Joint work with Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang August 26, 2014 NYC, USA 2 Fraud Detection:


slide-1
SLIDE 1

CATCHSYNC: CATCHING SYNCHRONIZED BEHAVIOR IN LARGE DIRECTED GRAPHS

Meng Jiang, Tsinghua University, Beijing, China Joint work with Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang August 26, 2014 – NYC, USA

slide-2
SLIDE 2

Fraud Detection: Graph Analysis Problem

2 [www.buyfollowz.org] [buymorelikes.com]

slide-3
SLIDE 3

Fraud Detection: Graph Analysis Problem

3 [buycheaplikes.com] [reviewsteria.com]

slide-4
SLIDE 4

Our Goals

  • Given: A graph (large-scale, directed, etc.)
  • Find: Frauds = Anomalous edges
  • Goals:
  • G1. Find patterns that distinguish fraudsters

from normal users

  • G2. Design algorithms that catch fraudsters

4

slide-5
SLIDE 5
  • 1. Background

OUTLINE

  • 2. Fraudulent Pattern
  • 3. The Algorithm
  • 4. Experiments

5

slide-6
SLIDE 6

Anomalies in Degree Distributions

  • Power-law distribution

6

DBLP Author-publication Flickr User-user Twitter Who-follows-whom

[konect.uni-koblenz.de/networks/]

slide-7
SLIDE 7

Anomalies in Degree Distributions

7

0.41M 3.17M d=20

2009

41M

slide-8
SLIDE 8

Linear Classifier with “Degree”: Fail

8

Label (+1,-1) Out-degree

classifier =20? +1 (Fraud)

×

0.41M 3.17M d=20

slide-9
SLIDE 9

Graph Structure Distorted

9

2011

117M 0.44M 1.91M d=64

slide-10
SLIDE 10

Traditional Fraud Detection

10

Label (+1,-1) Out-degree In-degree #tweet #url in tweets #hashtag in tweets

classifier

Content-based features

Big? Small? Big? Big? Big? +1 (Fraud)

slide-11
SLIDE 11

Empty Profile?

11

slide-12
SLIDE 12

Few Followers?

12

slide-13
SLIDE 13

Many Followings?

13

slide-14
SLIDE 14

Content: Unavailable? Look Normal?

14

Label (+1,-1) Out-degree In-degree #tweet #url in tweets #hashtag in tweets

classifier

Content-based features

0, 0, 0… sorry

slide-15
SLIDE 15

Behavior is the Key

15

Monetary Incentive Content what they appear to behave Behavior/ Links what they have to behave

slide-16
SLIDE 16

OUTLINE

  • 2. Fraudulent Pattern
  • 3. The Algorithm
  • 4. Experiments

16

  • 1. Background
slide-17
SLIDE 17

Behavior-based Features

17

Out-degree 1st left singular vector (Hubness) 2nd left singular vector … In-degree 1st right singular vector (Authoritativeness) 2nd right singular vector …

Follower behavior Followee behavior ≈ ≈

slide-18
SLIDE 18

Behavior-based Feature Space

18

Follower behavior Followee behavior

slide-19
SLIDE 19

Fraudulent Behavior Patterns

19

slide-20
SLIDE 20

Fraudulent Behavior Patterns

20

slide-21
SLIDE 21

Fraudulent Behavior Patterns

21

slide-22
SLIDE 22

Fraudulent Behavior Patterns

22

slide-23
SLIDE 23

Fraudulent Behavior Patterns

23

slide-24
SLIDE 24

Fraudulent Behavior Patterns

  • Synchronized
  • Abnormal

24

slide-25
SLIDE 25

OUTLINE

  • 3. The Algorithm
  • 4. Experiments

25

  • 1. Background
  • 2. Fraudulent Pattern
slide-26
SLIDE 26

Synchronicity and Normality

  • Synchronicity

26

slide-27
SLIDE 27

Synchronicity and Normality

  • Normality

27

slide-28
SLIDE 28

Synchronicity-Normality Plot

28

slide-29
SLIDE 29

Theorem

  • For any distribution, there is a parabolic lower

limit in the synchronicity-normality plot.

  • Proof. See our paper 

29

synchronicity normality

slide-30
SLIDE 30

CatchSync Algorithm

  • Distance-based

anomaly detection

  • Fraudsters
  • Big synchronicity
  • Small normality
  • Away from the densest

30

slide-31
SLIDE 31

OUTLINE

  • 4. Experiments

31

  • 1. Background
  • 2. Fraudulent Pattern Mining
  • 3. The Algorithm
slide-32
SLIDE 32

Experiments

  • Q1: Does CatchSync remove anomalies?
  • Degree distribution
  • Feature space
  • Q2: Is CatchSync catching actually

fraudulent users?

  • Q3: Is CatchSync robust?

32

slide-33
SLIDE 33

Q1: Does CatchSync Remove Anomalies?

33

0.41M 3.17M d=20

2009

41M

slide-34
SLIDE 34

Q1: Does CatchSync Remove Anomalies?

34

2011

117M

slide-35
SLIDE 35

Before CatchSync

35

Follower behavior Followee behavior

slide-36
SLIDE 36

After CatchSync

36

Follower behavior Followee behavior

slide-37
SLIDE 37

Q2: Is CatchSync Catching Actually Fraudulent Users?

37

173/1,000 237/1,000

slide-38
SLIDE 38

Q2: Is CatchSync Catching Actually Fraudulent Users?

38

0.412 0.597 0.751 0.813

0.2 0.4 0.6 0.8 1

CatchSync +SPOT CatchSync SPOT OutRank

slide-39
SLIDE 39

Q2: Is CatchSync Catching Actually Fraudulent Users?

39

0.377 0.653 0.694 0.785

0.2 0.4 0.6 0.8 1

CatchSync +SPOT CatchSync SPOT OutRank

slide-40
SLIDE 40

Q2: Is CatchSync Catching Actually Fraudulent Users?

40

Recall = 80% Precision in Twitter Precision in Tencent Weibo 83.5% 79.4%

slide-41
SLIDE 41

Q3: Is CatchSync Robust to Camouflage?

41

Target Popular camouflage Random camouflage

slide-42
SLIDE 42

Q3: Is CatchSync Robust to Camouflage?

42

slide-43
SLIDE 43

Q3: Is CatchSync Robust to Camouflage?

43

slide-44
SLIDE 44

Q3: Is CatchSync Robust to Camouflage?

44

Random camouflage Popular camouflage

slide-45
SLIDE 45

Conclusion

  • Goals
  • G1. Find patterns that distinguish fraudulent

user behavior from normal behavior

  • A1: Synchronized & Abnormal!
  • G2. Design algorithms that catch fraudsters
  • A2: CatchSync!
  • Remove spikes
  • Content free
  • Robust to camouflage

45

slide-46
SLIDE 46

Questions?

Meng Jiang mjiang89@gmail.com http://www.meng-jiang.com

46