+ Collective Spammer Detection in Evolving Multi-Relational Social - - PowerPoint PPT Presentation

collective spammer detection in evolving multi relational
SMART_READER_LITE
LIVE PREVIEW

+ Collective Spammer Detection in Evolving Multi-Relational Social - - PowerPoint PPT Presentation

+ Collective Spammer Detection in Evolving Multi-Relational Social Networks Shobeir Fakhraei (University of Maryland) James Foulds (University of California, Santa Cruz) Madhusudana Shashanka (if(we) Inc., Currently Niara Inc.) Lise Getoor


slide-1
SLIDE 1

+

Collective Spammer Detection in Evolving Multi-Relational Social Networks

Shobeir Fakhraei (University of Maryland) James Foulds (University of California, Santa Cruz) Madhusudana Shashanka (if(we) Inc., Currently Niara Inc.) Lise Getoor (University of California, Santa Cruz)

slide-2
SLIDE 2

Spam in Social Networks

n Recent study by Nexgate in 2013:

n Spam grew by more than 300% in half a year

2

slide-3
SLIDE 3

Spam in Social Networks

n Recent study by Nexgate in 2013:

n Spam grew by more than 300% in half a year n 1 in 200 social messages are spam

3

slide-4
SLIDE 4

Spam in Social Networks

n Recent study by Nexgate in 2013:

n Spam grew by more than 300% in half a year n 1 in 200 social messages are spam n 5% of all social apps are spammy

4

slide-5
SLIDE 5

Spam in Social Networks

n What’s different about social networks?

n Spammers have more ways to interact with users

5

slide-6
SLIDE 6

Spam in Social Networks

n What’s different about social networks?

n Spammers have more ways to interact with users n Messages, comments on photos, winks,…

6

slide-7
SLIDE 7

Spam in Social Networks

n What’s different about social networks?

n Spammers have more ways to interact with users n Messages, comments on photos, winks,… n They can split spam across multiple messages

7

slide-8
SLIDE 8

Spam in Social Networks

n What’s different about social networks?

n Spammers have more ways to interact with users n Messages, comments on photos, winks,… n They can split spam across multiple messages n More available info about users on their profiles!

8

slide-9
SLIDE 9

Spammers are getting smarter!

9

Want some replica luxury watches? Click here: http://SpammyLink.com

Traditional Spam:

George Shobeir

slide-10
SLIDE 10

Spammers are getting smarter!

10

Want some replica luxury watches? Click here: http://SpammyLink.com

Traditional Spam: [Report Spam]

George Shobeir

slide-11
SLIDE 11

Spammers are getting smarter!

11

Want some replica luxury watches? Click here: http://SpammyLink.com

Traditional Spam: (Intelligent) Social Spam:

Hey Shobeir! Nice profile photo. I live in Bay Area too. Wanna chat?

[Report Spam]

George Shobeir Shobeir Mary

slide-12
SLIDE 12

Spammers are getting smarter!

12

Want some replica luxury watches? Click here: http://SpammyLink.com

Traditional Spam: (Intelligent) Social Spam:

Hey Shobeir! Nice profile photo. I live in Bay Area too. Wanna chat?

Sure! :)

[Report Spam]

George Shobeir Shobeir Mary

slide-13
SLIDE 13

Spammers are getting smarter!

13

Want some replica luxury watches? Click here: http://SpammyLink.com

Traditional Spam: (Intelligent) Social Spam:

Hey Shobeir! Nice profile photo. I live in Bay Area too. Wanna chat?

Sure! :)

[Report Spam]

George Shobeir Shobeir Mary

I’m logging off here., too many people pinging me! I really like you, let’s chat more here: http://SpammyLink.com

Mary

Realistic Looking Conversation

slide-14
SLIDE 14

Tagged.com

n Founded in 2004, is a social networking site which

connects people through social interactions and games

n Over 300 million registered members n Data sample for experiments (on a laptop):

n 5.6 Million users (3.9% Labeled Spammers) n 912 Million Links

14

slide-15
SLIDE 15

Social Networks: Multi-relational and Time-Evolving

15

t ( 1 ) t(3) t(2) t(4) t(5) t(6) t ( 7 ) t ( 8 ) t ( 9 ) t(10) t ( 1 1 )

slide-16
SLIDE 16

Social Networks: Multi-relational and Time-Evolving

16

t ( 1 ) t(3) t(2) t(4) t(5) t(6) t ( 7 ) t ( 8 ) t ( 9 ) t(10) t ( 1 1 )

Legitimate users

slide-17
SLIDE 17

Social Networks: Multi-relational and Time-Evolving

17

t ( 1 ) t(3) t(2) t(4) t(5) t(6) t ( 7 ) t ( 8 ) t ( 9 ) t(10) t ( 1 1 )

Legitimate users Spammers

slide-18
SLIDE 18

Social Networks: Multi-relational and Time-Evolving

Link = Action at time t Actions = Profile view, message, poke, report abuse, etc

18

t ( 1 ) t(3) t(2) t(4) t(5) t(6) t ( 7 ) t ( 8 ) t ( 9 ) t(10) t ( 1 1 )

Legitimate users Spammers

slide-19
SLIDE 19

Social Networks: Multi-relational and Time-Evolving

Link = Action at time t Actions = Profile view, message, poke, report abuse, etc

19

t ( 1 ) t(3) t(2) t(4) t(5) t(6) t ( 7 ) t ( 8 ) t ( 9 ) t(10) t ( 1 1 )

slide-20
SLIDE 20

Social Networks: Multi-relational and Time-Evolving

Link = Action at time t Actions = Profile view, message, poke, report abuse, etc

20

t ( 1 ) t(3) t(2) t(4) t(5) t(6) t ( 7 ) t ( 8 ) t ( 9 ) t(10) t ( 1 1 )

Profile view

slide-21
SLIDE 21

Social Networks: Multi-relational and Time-Evolving

Link = Action at time t Actions = Profile view, message, poke, report abuse, etc

21

t ( 1 ) t(3) t(2) t(4) t(5) t(6) t ( 7 ) t ( 8 ) t ( 9 ) t(10) t ( 1 1 )

Profile view Message

slide-22
SLIDE 22

Social Networks: Multi-relational and Time-Evolving

Link = Action at time t Actions = Profile view, message, poke, report abuse, etc

22

t ( 1 ) t(3) t(2) t(4) t(5) t(6) t ( 7 ) t ( 8 ) t ( 9 ) t(10) t ( 1 1 )

Profile view Message Poke

slide-23
SLIDE 23

Social Networks: Multi-relational and Time-Evolving

Link = Action at time t Actions = Profile view, message, poke, report abuse, etc

23

t ( 1 ) t(3) t(2) t(4) t(5) t(6) t ( 7 ) t ( 8 ) t ( 9 ) t(10) t ( 1 1 )

Profile view Message Poke Report spammer

slide-24
SLIDE 24

Our Approach

24

t ( 1 ) t ( 3 ) t ( 2 ) t ( 4 ) t(5) t(6) t(7) t(8) t(9) t(10) t ( 1 1 )

Predict spammers based on:

n Graph structure n Action sequences n Reporting behavior

slide-25
SLIDE 25

Our Approach

25

t ( 1 ) t ( 3 ) t ( 2 ) t ( 4 ) t(5) t(6) t(7) t(8) t(9) t(10) t ( 1 1 )

Predict spammers based on:

n Graph structure n Action sequences n Reporting behavior

slide-26
SLIDE 26

Graph Structure Feature Extraction

26

Are you interested?

Meet Me Play Pets Message Wink Report Abuse Friend Request

Pagerank, 
 K-core, 
 Graph coloring, 
 Triangle count, Connected components, In/out degree

Graphs for each relation

slide-27
SLIDE 27

Graph Structure Feature Extraction

27

Are you interested?

Meet Me Play Pets Message Wink Report Abuse Friend Request

Pagerank, 
 K-core, 
 Graph coloring, 
 Triangle count, Connected components, In/out degree

Graphs for each relation Features

slide-28
SLIDE 28

Graph Structure Features

n Extract features for each relation graph es for each of 10 rel

n PageRank n Degree statistics

n Total degree n In degree n Out degree

n k-Core n Graph coloring n Connected components n Triangle count

28

(8 features for each of 10 relations)

slide-29
SLIDE 29

Graph Structure Features

n Extract features for each relation graph es for each of 10 rel

n PageRank n Degree statistics

n Total degree n In degree n Out degree

n k-Core n Graph coloring n Connected components n Triangle count

29

(8 features for each of 10 relations)

slide-30
SLIDE 30

Graph Structure Features

n Extract features for each relation graph es for each of 10 rel

n PageRank n Degree statistics

n Total degree n In degree n Out degree

n k-Core n Graph coloring n Connected components n Triangle count

30

(8 features for each of 10 relations)

slide-31
SLIDE 31

Graph Structure Features

n Extract features for each relation graph es for each of 10 rel

n PageRank n Degree statistics

n Total degree n In degree n Out degree

n k-Core n Graph coloring n Connected components n Triangle count

31

(8 features for each of 10 relations)

slide-32
SLIDE 32

Graph Structure Features

n Extract features for each relation graph es for each of 10 rel

n PageRank n Degree statistics

n Total degree n In degree n Out degree

n k-Core n Graph coloring n Connected components n Triangle count

32

(8 features for each of 10 relations)

slide-33
SLIDE 33

Graph Structure Features

n Extract features for each relation graph es for each of 10 rel

n PageRank n Degree statistics

n Total degree n In degree n Out degree

n k-Core n Graph coloring n Connected components n Triangle count

33

(8 features for each of 10 relations)

slide-34
SLIDE 34

Graph Structure Features

n Extract features for each relation graph es for each of 10 rel

n PageRank n Degree statistics

n Total degree n In degree n Out degree

n k-Core n Graph coloring n Connected components n Triangle count

34

(8 features for each of 10 relations)

slide-35
SLIDE 35

Graph Structure Features

n Extract features for each relation graph es for each of 10 rel

n PageRank n Degree statistics

n Total degree n In degree n Out degree

n k-Core n Graph coloring n Connected components n Triangle count

35

X

(8 features for each of 10 relations)

slide-36
SLIDE 36

Graph Structure Features

n Extract features for each relation graph es for each of 10 rel

n PageRank n Degree statistics

n Total degree n In degree n Out degree

n k-Core n Graph coloring n Connected components n Triangle count

36

n Viewing profile n Friend requests n Message n Luv n Wink n Pets game n Buying n Wishing n MeetMe game n Yes n No n Reporting abuse

X

(8 features for each of 10 relations)

slide-37
SLIDE 37

Graph Structure Features

37

Graph Structure

PageRank Triangle Count In-Degree Out-Degree k-Core Graph Coloring

PageRank Triangle Count In-Degree Out-Degree k-Core Graph Coloring

t ( 1 ) t ( 9 ) t(10)

Classification method: Gradient Boosted Trees Viewing profile Reporting abuse …

slide-38
SLIDE 38

Graph Structure Features

38

Experiments ¡ AU-­‑PR ¡ AU-­‑ROC ¡ 1 ¡Rela'on, ¡ ¡ 8 ¡Feature ¡types ¡ 0.187 ¡± ¡0.004 ¡ ¡ 0.803 ¡ ¡±0.001 ¡ ¡ 10 ¡Rela'ons, ¡ ¡ 1 ¡Feature ¡type ¡ 0.285 ¡± ¡0.002 ¡ ¡ 0.809 ¡± ¡0.001 ¡ ¡ 10 ¡Rela'ons, ¡ ¡ 8 ¡Feature ¡types ¡ 0.328 ¡± ¡0.003 ¡ ¡ 0.817 ¡± ¡0.001 ¡ ¡

Multiple relations/features better performance!

slide-39
SLIDE 39

Graph Structure Features

39

Experiments ¡ AU-­‑PR ¡ AU-­‑ROC ¡ 1 ¡Rela6on, ¡ ¡ 8 ¡Feature ¡types ¡ 0.187 ¡± ¡0.004 ¡ ¡ 0.803 ¡ ¡±0.001 ¡ ¡ 10 ¡Rela'ons, ¡ ¡ 1 ¡Feature ¡type ¡ 0.285 ¡± ¡0.002 ¡ ¡ 0.809 ¡± ¡0.001 ¡ ¡ 10 ¡Rela'ons, ¡ ¡ 8 ¡Feature ¡types ¡ 0.328 ¡± ¡0.003 ¡ ¡ 0.817 ¡± ¡0.001 ¡ ¡

Multiple relations/features better performance!

slide-40
SLIDE 40

Graph Structure Features

40

Experiments ¡ AU-­‑PR ¡ AU-­‑ROC ¡ 1 ¡Rela'on, ¡ ¡ 8 ¡Feature ¡types ¡ 0.187 ¡± ¡0.004 ¡ ¡ 0.803 ¡ ¡±0.001 ¡ ¡ 10 ¡Rela6ons, ¡ ¡ 1 ¡Feature ¡type ¡ 0.285 ¡± ¡0.002 ¡ ¡ 0.809 ¡± ¡0.001 ¡ ¡ 10 ¡Rela'ons, ¡ ¡ 8 ¡Feature ¡types ¡ 0.328 ¡± ¡0.003 ¡ ¡ 0.817 ¡± ¡0.001 ¡ ¡

Multiple relations/features better performance!

slide-41
SLIDE 41

Graph Structure Features

41

Experiments ¡ AU-­‑PR ¡ AU-­‑ROC ¡ 1 ¡Rela'on, ¡ ¡ 8 ¡Feature ¡types ¡ 0.187 ¡± ¡0.004 ¡ ¡ 0.803 ¡ ¡±0.001 ¡ ¡ 10 ¡Rela'ons, ¡ ¡ 1 ¡Feature ¡type ¡ 0.285 ¡± ¡0.002 ¡ ¡ 0.809 ¡± ¡0.001 ¡ ¡ 10 ¡Rela6ons, ¡ ¡ 8 ¡Feature ¡types ¡ 0.328 ¡± ¡0.003 ¡ ¡ 0.817 ¡± ¡0.001 ¡ ¡

Multiple relations/features better performance!

slide-42
SLIDE 42

Our Approach

42

t ( 1 ) t ( 3 ) t ( 2 ) t ( 4 ) t(5) t(6) t(7) t(8) t(9) t(10) t ( 1 1 )

Predict spammers based on:

n Graph structure n Action sequences n Reporting behavior

slide-43
SLIDE 43

Sequence of Actions

n Sequential Bigram Features:

Short sequence segment of 2 consecutive actions, to capture sequential information

43

User1 ¡Ac'ons: ¡ ¡ Message, ¡Profile_view, ¡Message, ¡Friend_Request, ¡…. ¡

slide-44
SLIDE 44

Sequence of Actions

n Mixture of Markov Models (MMM):

A.k.a. chain-augmented, tree-augmented naive Bayes

44

L

a1 a2 an-1 an ...

y

x1 x2 xn-1 xn ...

P(y, x) = P(y)P(x1|y)

n

Y

i=2

P(xi|xi−1, y) ,

slide-45
SLIDE 45

Sequence of Actions

45

Action Sequence … Bigram Features

+

Chain Augmented NB t(1) t(9) t ( 1 )

slide-46
SLIDE 46

Sequence of Actions

Experiments ¡ AU-­‑PR ¡ AU-­‑ROC ¡ Bigram ¡Features ¡ 0.471 ¡± ¡0.004 ¡ ¡ 0.859 ¡± ¡0.001 ¡ ¡ MMM ¡ 0.246 ¡± ¡0.009 ¡ ¡ 0.821 ¡± ¡0.003 ¡ ¡

Bigram ¡+ ¡MMM ¡

0.468 ¡± ¡0.012 ¡ ¡ 0.860 ¡± ¡0.002 ¡ ¡

46

Little benefit from MMM (although little overhead)

slide-47
SLIDE 47

Results

47

Precision-Recall ROC

We can classify 70% of the spammers that need manual labeling with about 90% accuracy

slide-48
SLIDE 48

Deployment and Example Runtimes

n We can:

n Run the model on short intervals, with new snapshots

  • f the network

n Update the features as events occur

n Example runtimes with Graphlab CreateTM on a

Macbook Pro:

n 5.6 million vertices and 350 million edges: n PageRank: 6.25 minutes n Triangle counting: 17.98 minutes n k-core: 14.3 minutes

48

slide-49
SLIDE 49

Our Approach

49

t ( 1 ) t ( 3 ) t ( 2 ) t ( 4 ) t(5) t(6) t(7) t(8) t(9) t(10) t ( 1 1 )

Predict spammers based on:

n Graph structure n Action sequences n Reporting behavior

slide-50
SLIDE 50

Refining the abuse reporting systems

n Abuse report systems are very noisy

n People have different standards n Spammers report random people to increase noise n Personal gain in social games

n Goal is to clean up the system using:

n Reporters’ previous history n Collective reasoning over reports

50

slide-51
SLIDE 51

Collective Classification with Reports

51

Report Subgraph Probabilistic Soft Logic t(1) t(9) t ( 1 )

slide-52
SLIDE 52

HL-MRFs & Probabilistic Soft Logic (PSL)

  • Probabilistic Soft Logic (PSL), a declarative modeling

language based on first-order logic

  • Weighted logical rules define a probabilistic

graphical model:

  • Instantiated rules reduce the probability of any state

that does not satisfy the rule, as measured by its distance to satisfaction

ω : P(A, B) ∧ Q(B, C) → R(A, C)

52 ¡

slide-53
SLIDE 53

Collective Classification with Reports

n Model using only reports:

53

REPORTED(v1, v2) → SPAMMER(v2) ¬SPAMMER(v)

slide-54
SLIDE 54

Collective Classification with Reports

n Model using reports and credibility of

the reporter:

54

CREDIBLE(v1) ∧ REPORTED(v1, v2) → SPAMMER(v2) PRIOR-CREDIBLE(v) → CREDIBLE(v) ¬PRIOR-CREDIBLE(v) →¬CREDIBLE(v) ¬SPAMMER(v)

slide-55
SLIDE 55

Collective Classification with Reports

n Model using reports, credibility of the reporter,

and collective reasoning:

55

L

a1 a2 an-1 an ...

y

x1 x2 xn-1 xn ...

CREDIBLE(v1) ∧ REPORTED(v1, v2) → SPAMMER(v2) SPAMMER(v2) ∧ REPORTED(v1, v2) → CREDIBLE(v1) ¬SPAMMER(v2) ∧ REPORTED(v1, v2) →¬CREDIBLE(v1) PRIOR-CREDIBLE(v) → CREDIBLE(v) ¬PRIOR-CREDIBLE(v) →¬CREDIBLE(v) ¬SPAMMER(v)

slide-56
SLIDE 56

Results of Classification Using Reports

Experiments ¡ AU-­‑PR ¡ AU-­‑ROC ¡ Reports ¡Only ¡ 0.674 ¡± ¡0.008 ¡ ¡ 0.611 ¡ ¡± ¡0.007 ¡ ¡ Reports ¡& ¡Credibility ¡ 0.869 ¡± ¡0.006 ¡ ¡ 0.862 ¡± ¡0.004 ¡ ¡ Reports ¡& ¡Credibility ¡ ¡ & ¡Collec've ¡Reasoning ¡ 0.884 ¡± ¡0.005 ¡ ¡ 0.873 ¡± ¡0.004 ¡ ¡

56

slide-57
SLIDE 57

Results of Classification Using Reports

Experiments ¡ AU-­‑PR ¡ AU-­‑ROC ¡ Reports ¡Only ¡ 0.674 ¡± ¡0.008 ¡ ¡ 0.611 ¡ ¡± ¡0.007 ¡ ¡ Reports ¡& ¡Credibility ¡ 0.869 ¡± ¡0.006 ¡ ¡ 0.862 ¡± ¡0.004 ¡ ¡ Reports ¡& ¡Credibility ¡ ¡ & ¡Collec've ¡Reasoning ¡ 0.884 ¡± ¡0.005 ¡ ¡ 0.873 ¡± ¡0.004 ¡ ¡

57

slide-58
SLIDE 58

Results of Classification Using Reports

Experiments ¡ AU-­‑PR ¡ AU-­‑ROC ¡ Reports ¡Only ¡ 0.674 ¡± ¡0.008 ¡ ¡ 0.611 ¡ ¡± ¡0.007 ¡ ¡ Reports ¡& ¡Credibility ¡ 0.869 ¡± ¡0.006 ¡ ¡ 0.862 ¡± ¡0.004 ¡ ¡ Reports ¡& ¡Credibility ¡ ¡ & ¡Collec've ¡Reasoning ¡ 0.884 ¡± ¡0.005 ¡ ¡ 0.873 ¡± ¡0.004 ¡ ¡

58

slide-59
SLIDE 59

Results of Classification Using Reports

Experiments ¡ AU-­‑PR ¡ AU-­‑ROC ¡ Reports ¡Only ¡ 0.674 ¡± ¡0.008 ¡ ¡ 0.611 ¡ ¡± ¡0.007 ¡ ¡ Reports ¡& ¡Credibility ¡ 0.869 ¡± ¡0.006 ¡ ¡ 0.862 ¡± ¡0.004 ¡ ¡ Reports ¡& ¡Credibility ¡ ¡ & ¡Collec6ve ¡Reasoning ¡ 0.884 ¡± ¡0.005 ¡ ¡ 0.873 ¡± ¡0.004 ¡ ¡

59

slide-60
SLIDE 60

Conclusion

60

t ( 1 ) t ( 9 ) t(10)

Graph Structure

PageRank Triangle Count In-Degree Out-Degree k-Core Graph Coloring PageRank Triangle Count In-Degree Out-Degree k-Core Graph Coloring

Multiple relations are more predictive than multiple features AUPR: 0.187 → 0.328

Code and part of the data will be released soon: https://github.com/shobeir/fakhraei_kdd2015

slide-61
SLIDE 61

Conclusion

61

t ( 1 ) t ( 9 ) t(10)

Graph Structure

PageRank Triangle Count In-Degree Out-Degree k-Core Graph Coloring PageRank Triangle Count In-Degree Out-Degree k-Core Graph Coloring

Action Sequence

Bigram Features

+

Chain Augmented NB

Multiple relations are more predictive than multiple features Even simple bigrams are highly predictive AUPR: 0.187 → 0.328 AUPR: 0.471

Code and part of the data will be released soon: https://github.com/shobeir/fakhraei_kdd2015

slide-62
SLIDE 62

Conclusion

62

t ( 1 ) t ( 9 ) t(10)

Graph Structure

PageRank Triangle Count In-Degree Out-Degree k-Core Graph Coloring PageRank Triangle Count In-Degree Out-Degree k-Core Graph Coloring

Action Sequence

Bigram Features

+

Chain Augmented NB

Multiple relations are more predictive than multiple features Even simple bigrams are highly predictive AUPR: 0.187 → 0.328 AUPR: 0.471 Can classify 70% of the spammers that needed manual labeling with 90% accuracy AUPR: 0.779

Code and part of the data will be released soon: https://github.com/shobeir/fakhraei_kdd2015

slide-63
SLIDE 63

Conclusion

63

t ( 1 ) t ( 9 ) t(10)

Graph Structure

PageRank Triangle Count In-Degree Out-Degree k-Core Graph Coloring PageRank Triangle Count In-Degree Out-Degree k-Core Graph Coloring

Action Sequence

Bigram Features

+

Chain Augmented NB

Report Subgraph Probabilistic Soft Logic

Multiple relations are more predictive than multiple features Even simple bigrams are highly predictive Jointly refining the credibility of the source is highly effective! AUPR: 0.187 → 0.328 AUPR: 0.471 AUPR: 0.674 → 0.884 Can classify 70% of the spammers that needed manual labeling with 90% accuracy AUPR: 0.779

Code and part of the data will be released soon: https://github.com/shobeir/fakhraei_kdd2015

slide-64
SLIDE 64

Acknowledgements

n Collaborators: n If(we) Inc. (Formerly Tagged Inc.):

Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill

n Dato (Formerly Graphlab):

Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng

64

Shobeir Fakhraei

  • Univ. of Maryland

Lise Getoor

  • Univ. California, Santa Cruz

Madhusudana Shashanka if(we) Inc., currently Niara Inc.

slide-65
SLIDE 65

Conclusion

65

t ( 1 ) t ( 9 ) t(10)

Graph Structure

PageRank Triangle Count In-Degree Out-Degree k-Core Graph Coloring PageRank Triangle Count In-Degree Out-Degree k-Core Graph Coloring

Action Sequence

Bigram Features

+

Chain Augmented NB

Report Subgraph Probabilistic Soft Logic

Multiple relations are more predictive than multiple features Even simple bigrams are highly predictive Jointly refining the credibility of the source is highly effective! AUPR: 0.187 → 0.328 AUPR: 0.471 AUPR: 0.674 → 0.884 Can classify 70% of the spammers that needed manual labeling with 90% accuracy AUPR: 0.779

Code and part of the data will be released soon: https://github.com/shobeir/fakhraei_kdd2015