Combating Friend Spam Using Social Rejections Michael Sirivianos - - PowerPoint PPT Presentation

combating friend spam using social rejections
SMART_READER_LITE
LIVE PREVIEW

Combating Friend Spam Using Social Rejections Michael Sirivianos - - PowerPoint PPT Presentation

Dagstuhl: Cybersafety in Modern Online Social Networks Combating Friend Spam Using Social Rejections Michael Sirivianos Cyprus University of Technology Joint work with Q. Cao, Xiaowei Yang and 1 K. Munagala at Duke University Friend Spam in


slide-1
SLIDE 1

Combating Friend Spam Using Social Rejections

1

Michael Sirivianos Cyprus University of Technology Joint work with Q. Cao, Xiaowei Yang and

  • K. Munagala at Duke University

Dagstuhl: Cybersafety in Modern Online Social Networks

slide-2
SLIDE 2

Friend Spam in

  • nline social networks (OSNs)

2

§ Friend spam: unwanted friend requests

Ø Degrades user experience (e.g., annoying) Ø Introduces false OSN links

Fake account

slide-3
SLIDE 3

False OSN links are harmful

3

§ Pollute the underlying social graph

Ø Detrimental to social search and online ad targeting Ø Jeopardize online privacy and safety

slide-4
SLIDE 4

False OSN links undermine the effectiveness of Sybil defense

4

§ The defense relies on genuine social links

Ø SybilLimit [S&P’08], SybilInfer [NDSS’09], SybilRank [NSDI’12] Ø # undetected Sybils (fake accounts) is bounded to O(log |V|) per link between Sybils and legitimate users

Non-Sybil region Sybil region OSN links

slide-5
SLIDE 5

Existing counter-measures

5

§ Privacy settings for OSN users

Ø Restrict requests only from friends of friends Ø Subtract from the openness of the OSN

§ Spam request filtering using machine learning

(ML)

Ø Facebook Immune System (SNS’11) Ø Individual user features are manipublable

slide-6
SLIDE 6

Rejecto: Combating friend spam using social rejections

6

slide-7
SLIDE 7

§ False OSN links come with social rejections

Ø Social rejections: rejected, ignored, and reported requests Ø Spam requests are less likely to be accepted

Observation: the cost of connecting to real users

7

Legitimate users Friend spammers

Many rejections

. . .

slide-8
SLIDE 8

§ Each has a significant number of pending requests

Ø Fake Facebook accounts from an underground market Ø More measurement results in the paper

Live fake accounts in the wild

8

20 40 60 80 100 120 10 20 30 40 Number of requests Anonymized fake account ID Pending requests Friends

slide-9
SLIDE 9

How reliable is social rejection?

9

§ Attackers inevitably trigger rejections

Ø Disproportionally large number of accounts and requests Ø Requests inevitably hit cautious users

§ Rejection towards innocent users is non-manipulable

Ø A rejection is guarded by a feedback loop between the request sender and the receiver Ø Legitimate users rarely receive rejections Ø Fundamentally different from negative ratings on online services (e.g., YouTube)

slide-10
SLIDE 10

§ Attack strategies

Ø Collusion: fake accounts collude to accept requests

Ø Arbitrarily boost the request acceptance rate of an individual account

Ø Self-rejection: mimic legitimate users rejecting others

Ø Whitewash the fake accounts that rejecti fake accounts

§ System challenge

Ø Enormous user base with a large number requests and rejections

Challenges in using social rejection

10

slide-11
SLIDE 11

§ A strategy-proof formulation

Ø Graph cut on a rejection-augmented social graph Ø Low aggregate acceptance rate of the requests from spammers to legitimate users

§ An effective and near-linear algorithm

Ø Based on the Kernighan-Lin (KL) algorithm [The Bell System Technical Journal, 1970]

§ A scalable implementation

Ø Layered on top of Apache Spark [Zaharia et al. NSDI’12]

Rejecto in a nutshell

11

slide-12
SLIDE 12

§ Main idea: put spamming accounts into groups § Aggregate acceptance rate (AAR)

Rejecto’s formulation of spammer detection

12

S H

F H,S

( )

F H,S

( ) + R

! " H,S

( )

Fake accounts cannot arbitrarily improve their AAR

slide-13
SLIDE 13

§ Lower than the requests from a set of legitimate users

Ø Spam requests are less likely to be accepted

Spam requests lead to a low aggregate acceptance rate

13

A small AAR ratio cut

slide-14
SLIDE 14

§ Augments a social graph with rejections

Ø Directed rejection edges

§ Finds the cut with the minimum aggregate

acceptance rate (MAAR)

Ø Graph partitioning based on requests and rejections

§ Iteratively cuts off groups of suspicious accounts

Ø Prunes their links and rejections from the social graph

A graph cut model

14

slide-15
SLIDE 15

§ Collusion among spammers cannot improve MAAR

Ø MAAR does not count the colluding links

§ Self-rejection only exposes the part of rejected

accounts earlier to Rejecto

Ø Iteratively identify groups of spamming accounts

Immune to attack strategies

15

slide-16
SLIDE 16

§ The MAAR cut is NP-hard

Ø Reduced from MIN-RATIO-CUT problem [Leighton & Rao, JACM’79] Ø Detailed reduction in the paper

Finding the MAAR cut is challenging

16

slide-17
SLIDE 17

§ State of the art: O(log |V|) approximation

algorithms with a complexity of O(|V|2)

Ø Summarized by Madry [FOCS’10]

§ Shortcomings in the OSN context

Ø The approximation factor O(log |V|) is too loose

Ø O(|V|2) complexity is prohibitive

Ø Do not support parallel graph processing

Existing work on cut-based problems in undirected graphs

17

slide-18
SLIDE 18

§ Finds a MAAR cut by interchanging misplaced nodes

Ø Based on the Kernighan-Lin (KL) algorithm Ø O(|V|) complexity Ø Can scale up to multimillion-node social graphs

Our approach: an effective and efficient search algorithm

18

slide-19
SLIDE 19

§ Searches a balanced cut in undirected graphs

Ø Minimizes #cross-partition edges Ø Reduces cross-partition edges by swapping nodes Ø Fudiccia et al. improved to O(|V|) [DAC’82] Ø Widely used in VLSI layout design

A primer on the Kernighan-Lin (KL) algorithm

19

U V-U

… … … … … …

§ How to use KL to find the MAAR cut?

Ø Additional directed rejection edges Ø Non-linear MAAR objective function

slide-20
SLIDE 20

§ Convert to a set of bipartition problems

Ø Each with a linear objective function Ø Rejection and social links can be unified

Transforming the MAAR cut problem

20

F V − S,S

( )

F V − S,S

( ) + R

! " V − S,S

( )

F V − S,S

( ) − k × R

! " V − S,S

( )

Solvable by KL after unifying the rejections and OSN links according to the parameter k

slide-21
SLIDE 21

§ Transform to a family of bipartition problems each

minimizing

Ø Iterate k through a geometric sequence to cover the MAAR cut ratio k*

§ Extend KL to solve each of the converted problem

Ø Unify rejections and social links using k Ø Detailed algorithm in the paper

§ Pick the cut with the lowest aggregate ratio

Putting it all together

21

F U,U

( ) − k × R

! " U,U

slide-22
SLIDE 22

§ Extensive simulations on real social networks

Ø Sensitivity analysis Ø Resilience to attack strategies Ø Compared to VoteTrust

§ Simulations under Sybil attack

Ø In-depth defense with social-graph-based Sybil defense

§ A Rejecto prototype on an Amazon EC2 cluster

Ø Performance analysis on large graph processing

Evaluation

22

slide-23
SLIDE 23

§ Request flooding attacks on a Facebook sample graph

Ø Fake accounts connect with each other as normal users do

Rejecto is insensitive to the spam request volume

23

0.7 0.8 0.9 1 5 10 15 20 25 30 35 40 45 50 Precision/recall Number of requests per fake account Rejecto VoteTrust 0.4 0.6 0.8 1 5 10 15 20 25 30 35 40 45 50 Precision/recall Number of requests per fake account Rejecto VoteTrust

Rejecto uncovers fakes behind the active spamming ones

Only half of the fake accounts send out spam requests All fake accounts send out spam requests

slide-24
SLIDE 24

§ Our MAAR cut model is immune to manipulation

Rejecto is resilient to attack strategies

24

0.2 0.4 0.6 0.8 1 5 10 15 20 25 30 35 40 Precision/recall # of non-attack edges per fake account Rejecto VoteTrust 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision/recall Self-rejection rate among fake accounts Rejecto VoteTrust

Collusion strategy to form dense connections among fake accounts Self-rejection strategy to let half of the fakes reject the rest like legitimate users do

slide-25
SLIDE 25

§ Rejecto makes it hard for fakes to get additional links

Ø Defense in depth with SybilRank

Rejecto and social-graph-based Sybil detection form a defense in depth

25

0.4 0.6 0.8 1 1000 2000 3000 4000 5000

Area under the ROC curve Number of accounts removed by Rejecto

Facebook ca-AstroPh

Improvement

slide-26
SLIDE 26

§ Performance on an EC2 cluster

Ø Spark 0.9.2 Ø 5 c3.8xlarge VMs Ø A larger cluster yields better performance

Rejecto can handle multimillion-user social graphs

26

# Users 0.5M 1M 2M 5M 10M # Edges ~8M ~16M ~32M ~80M ~160M Execution time 288 sec 669 sec 1767 sec 8049 sec 7.7 hours

The execution time grows gracefully with the graph size

slide-27
SLIDE 27

Rejecto: uncovers friend spammers using social rejections

ØImmune to attack strategies ØEfficient ØScalable

Conclusion

27