Combating Friend Spam Using Social Rejections Michael Sirivianos - PowerPoint PPT Presentation

Dagstuhl: Cybersafety in Modern Online Social Networks Combating Friend Spam Using Social Rejections Michael Sirivianos Cyprus University of Technology Joint work with Q. Cao, Xiaowei Yang and 1 K. Munagala at Duke University

Friend Spam in online social networks (OSNs) § Friend spam: unwanted friend requests Ø Degrades user experience (e.g., annoying) Ø Introduces false OSN links Fake account 2

False OSN links are harmful § Pollute the underlying social graph Ø Detrimental to social search and online ad targeting Ø Jeopardize online privacy and safety 3

False OSN links undermine the effectiveness of Sybil defense § The defense relies on genuine social links Ø SybilLimit [S&P’08], SybilInfer [NDSS’09], SybilRank [NSDI’12] Ø # undetected Sybils (fake accounts) is bounded to O(log |V| ) per link between Sybils and legitimate users OSN links Non-Sybil region Sybil region 4

Existing counter-measures § Privacy settings for OSN users Ø Restrict requests only from friends of friends Ø Subtract from the openness of the OSN § Spam request filtering using machine learning (ML) Ø Facebook Immune System (SNS’11) Ø Individual user features are manipublable 5

Rejecto: Combating friend spam using social rejections 6

Observation: the cost of connecting to real users § False OSN links come with social rejections Ø Social rejections: rejected, ignored, and reported requests Ø Spam requests are less likely to be accepted Many rejections . . . Friend spammers 7 Legitimate users

Live fake accounts in the wild § Each has a significant number of pending requests Ø Fake Facebook accounts from an underground market Ø More measurement results in the paper 120 Number of requests Pending requests 100 Friends 80 60 40 20 0 0 10 20 30 40 Anonymized fake account ID 8

How reliable is social rejection? § Attackers inevitably trigger rejections Ø Disproportionally large number of accounts and requests Ø Requests inevitably hit cautious users § Rejection towards innocent users is non-manipulable Ø A rejection is guarded by a feedback loop between the request sender and the receiver Ø Legitimate users rarely receive rejections Ø Fundamentally different from negative ratings on online services (e.g., YouTube) 9

Challenges in using social rejection § Attack strategies Ø Collusion: fake accounts collude to accept requests Ø Arbitrarily boost the request acceptance rate of an individual account Ø Self-rejection: mimic legitimate users rejecting others Ø Whitewash the fake accounts that rejecti fake accounts § System challenge Ø Enormous user base with a large number requests and rejections 10

Rejecto in a nutshell § A strategy-proof formulation Ø Graph cut on a rejection-augmented social graph Ø Low aggregate acceptance rate of the requests from spammers to legitimate users § An effective and near-linear algorithm Ø Based on the Kernighan-Lin (KL) algorithm [The Bell System Technical Journal, 1970] § A scalable implementation Ø Layered on top of Apache Spark [Zaharia et al. NSDI’12] 11

Rejecto’s formulation of spammer detection § Main idea: put spamming accounts into groups Fake accounts cannot arbitrarily improve their AAR § Aggregate acceptance rate (AAR) ( ) F H , S ! " ( ) + R ( ) F H , S H , S H S 12

Spam requests lead to a low aggregate acceptance rate § Lower than the requests from a set of legitimate users Ø Spam requests are less likely to be accepted A small AAR ratio cut 13

A graph cut model § Augments a social graph with rejections Ø Directed rejection edges § Finds the cut with the minimum aggregate acceptance rate (MAAR) Ø Graph partitioning based on requests and rejections § Iteratively cuts off groups of suspicious accounts Ø Prunes their links and rejections from the social graph 14

Immune to attack strategies § Collusion among spammers cannot improve MAAR Ø MAAR does not count the colluding links § Self-rejection only exposes the part of rejected accounts earlier to Rejecto Ø Iteratively identify groups of spamming accounts 15

Finding the MAAR cut is challenging § The MAAR cut is NP-hard Ø Reduced from MIN-RATIO-CUT problem [Leighton & Rao, JACM’79] Ø Detailed reduction in the paper 16

Existing work on cut-based problems in undirected graphs § State of the art: O( log |V| ) approximation algorithms with a complexity of O(|V| 2 ) Ø Summarized by Madry [FOCS’10] § Shortcomings in the OSN context Ø The approximation factor O(log |V|) is too loose Ø O(|V| 2 ) complexity is prohibitive Ø Do not support parallel graph processing 17

Our approach: an effective and efficient search algorithm § Finds a MAAR cut by interchanging misplaced nodes Ø Based on the Kernighan-Lin (KL) algorithm Ø O(|V|) complexity Ø Can scale up to multimillion-node social graphs 18

A primer on the Kernighan-Lin (KL) algorithm § Searches a balanced cut in undirected graphs Ø Minimizes #cross-partition edges Ø Reduces cross-partition edges by swapping nodes Ø Fudiccia et al. improved to O(|V|) [DAC’82] § How to use KL to find the MAAR cut? Ø Widely used in VLSI layout design Ø Additional directed rejection edges Ø Non-linear MAAR objective function U V-U … … … … … … 19

Transforming the MAAR cut problem § Convert to a set of bipartition problems Ø Each with a linear objective function Ø Rejection and social links can be unified ( ) F V − S , S ! " ! " ( ) − k × R ( ) F V − S , S V − S , S ( ) + R ( ) F V − S , S V − S , S Solvable by KL after unifying the rejections and OSN links according to the parameter k 20

Putting it all together § Transform to a family of bipartition problems each ! " minimizing ( ) − k × R F U , U U , U Ø Iterate k through a geometric sequence to cover the MAAR cut ratio k * § Extend KL to solve each of the converted problem Ø Unify rejections and social links using k Ø Detailed algorithm in the paper § Pick the cut with the lowest aggregate ratio 21

Evaluation § Extensive simulations on real social networks Ø Sensitivity analysis Ø Resilience to attack strategies Ø Compared to VoteTrust § Simulations under Sybil attack Ø In-depth defense with social-graph-based Sybil defense § A Rejecto prototype on an Amazon EC2 cluster Ø Performance analysis on large graph processing 22

Rejecto is insensitive to the spam request volume § Request flooding attacks on a Facebook sample graph Ø Fake accounts connect with each other as normal users do Rejecto uncovers fakes behind the active spamming ones 1 1 Precision/recall Precision/recall Rejecto 0.8 0.9 Rejecto VoteTrust 0.6 VoteTrust 0.8 0.4 0.7 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Number of requests per fake account Number of requests per fake account All fake accounts send out Only half of the fake accounts spam requests send out spam requests 23

Rejecto is resilient to attack strategies § Our MAAR cut model is immune to manipulation Self-rejection strategy to let Collusion strategy to form half of the fakes reject the dense connections among rest like legitimate users do fake accounts 1 1 Precision/recall Precision/recall 0.8 0.8 0.6 Rejecto 0.6 Rejecto 0.4 VoteTrust 0.4 VoteTrust 0.2 0.2 0 0 5 10 15 20 25 30 35 40 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # of non-attack edges per fake account Self-rejection rate among fake accounts 24

Rejecto and social-graph-based Sybil detection form a defense in depth § Rejecto makes it hard for fakes to get additional links Ø Defense in depth with SybilRank Area under the ROC curve 1 Facebook ca-AstroPh Improvement 0.8 0.6 0.4 1000 2000 3000 4000 5000 Number of accounts removed by Rejecto 25

Rejecto can handle multimillion-user social graphs § Performance on an EC2 cluster Ø Spark 0.9.2 The execution time grows gracefully Ø 5 c3.8xlarge VMs with the graph size Ø A larger cluster yields better performance # Users 0.5M 1M 2M 5M 10M # Edges ~8M ~16M ~32M ~80M ~160M Execution 288 sec 669 sec 1767 sec 8049 sec 7.7 hours time 26

Conclusion Rejecto: uncovers friend spammers using social rejections Ø Immune to attack strategies Ø Efficient Ø Scalable 27

Combating Friend Spam Using Social Rejections Michael Sirivianos - PowerPoint PPT Presentation

Dagstuhl: Cybersafety in Modern Online Social Networks Combating Friend Spam Using Social Rejections Michael Sirivianos Cyprus University of Technology Joint work with Q. Cao, Xiaowei Yang and 1 K. Munagala at Duke University Friend Spam in

Spam, Spam, Spam Why is spam interesting? Everyone can observe spam. Spam / Anti-spam is a

Opinion Spam and Analysis NITIN JINDAL & BING LIU, WSDM 08 UIUC Opinion/Review Spam All

Link Spam Alliances Zoltn Gyngyi Hector Garcia-Molina Class List Spam 101 Intro to

Spam Fighting at CERN 28 April 2004 Emmanuel Ormancey 1 What is Spam ? What is Spam ? Spam

GeoPal: Friend Spam Detection in Social Networks with Private Location Proofs Bogdan Carbunar,

http://cs224w.stanford.edu Start with the intuition [Heider 46]: Friend of my friend is my

Spam Filtering with Naive Bayes Classifier Yuriy Arabskyy June 6, 2017 Table of contents What

Web Spam Dr. Marc Spaniol Dr. Marc Spaniol Saarbrcken, June 24, 2010 Databases and

Spam Is Bad John R. Levine Chair, IRTF ASRG Chair@asrg.sp.am http://asrg.sp.am Why is spam

Web Spam Marc Spaniol Marc Spaniol Saarbrcken, July 23, 2009 Databases and Information

Detecting Product Review Spammers using Rating Behaviors Itay Dressler What is Spam? Why

Combating Spam Server-side Purpose : to provide insight into the steps an organization can take

Non-Preemptive Flow with Rejections x Carnegie Mellon University x ICALP 2018 x

The CAN-SPAM Act of 2003 D E C E M B E R 2 0 0 3 THE CAN-SPAM ACT OF 2003 Status of Legislation

Spam Prevention using Spam Prevention using Access Code (AC) Access Code (AC) Akhtar H Khalil,

Committee Report 2018 / 2019 Your Committee Larry Herbert (Chair) - Friend of Keston Common John

Model e v al u ation and implementation C R E D IT R ISK MOD E L IN G IN P YTH ON Michael

Authentication of People what you know (passwords) what you have (keys) what you are

Marking and Selectively Retransmitting High-Priority Packets Jonathan Lennox Layered Media

Student Name: Kathlee een n Silveira ra Student ID: 420004 0440 Supervisor Name: Mehmet t

Using of time characteristic in Netflow data for improvement of protocol detection P. Piska,

WINLAB Rutgers, The State University of New Jersey www.winlab.rutgers.edu Song Liu, Larry J.

Autonomous Mobility-on-Demand Systems: False Myths and Open Questions Prof. Dr. Emilio Frazzoli,

PrefMiner: Mining Users Preferences for Intelligent Mobile

Combating Friend Spam Using Social Rejections Michael Sirivianos - PowerPoint PPT Presentation

Dagstuhl: Cybersafety in Modern Online Social Networks Combating Friend Spam Using Social Rejections Michael Sirivianos Cyprus University of Technology Joint work with Q. Cao, Xiaowei Yang and 1 K. Munagala at Duke University Friend Spam in

Spam, Spam, Spam Why is spam interesting? Everyone can observe spam. Spam / Anti-spam is a

Opinion Spam and Analysis NITIN JINDAL &amp; BING LIU, WSDM 08 UIUC Opinion/Review Spam All

Link Spam Alliances Zoltn Gyngyi Hector Garcia-Molina Class List Spam 101 Intro to

Spam Fighting at CERN 28 April 2004 Emmanuel Ormancey 1 What is Spam ? What is Spam ? Spam

GeoPal: Friend Spam Detection in Social Networks with Private Location Proofs Bogdan Carbunar,

http://cs224w.stanford.edu Start with the intuition [Heider 46]: Friend of my friend is my

Spam Filtering with Naive Bayes Classifier Yuriy Arabskyy June 6, 2017 Table of contents What

Web Spam Dr. Marc Spaniol Dr. Marc Spaniol Saarbrcken, June 24, 2010 Databases and

Spam Is Bad John R. Levine Chair, IRTF ASRG Chair@asrg.sp.am http://asrg.sp.am Why is spam

Web Spam Marc Spaniol Marc Spaniol Saarbrcken, July 23, 2009 Databases and Information

Detecting Product Review Spammers using Rating Behaviors Itay Dressler What is Spam? Why

Combating Spam Server-side Purpose : to provide insight into the steps an organization can take

Non-Preemptive Flow with Rejections x Carnegie Mellon University x ICALP 2018 x

The CAN-SPAM Act of 2003 D E C E M B E R 2 0 0 3 THE CAN-SPAM ACT OF 2003 Status of Legislation

Spam Prevention using Spam Prevention using Access Code (AC) Access Code (AC) Akhtar H Khalil,

Committee Report 2018 / 2019 Your Committee Larry Herbert (Chair) - Friend of Keston Common John

Model e v al u ation and implementation C R E D IT R ISK MOD E L IN G IN P YTH ON Michael

Authentication of People what you know (passwords) what you have (keys) what you are

Marking and Selectively Retransmitting High-Priority Packets Jonathan Lennox Layered Media

Student Name: Kathlee een n Silveira ra Student ID: 420004 0440 Supervisor Name: Mehmet t

Using of time characteristic in Netflow data for improvement of protocol detection P. Piska,

WINLAB Rutgers, The State University of New Jersey www.winlab.rutgers.edu Song Liu, Larry J.

Autonomous Mobility-on-Demand Systems: False Myths and Open Questions Prof. Dr. Emilio Frazzoli,

PrefMiner: Mining Users Preferences for Intelligent Mobile

Opinion Spam and Analysis NITIN JINDAL & BING LIU, WSDM 08 UIUC Opinion/Review Spam All