The rise of novel Twitter social spambots SoBigData day @EUI, - - PowerPoint PPT Presentation

the rise of novel twitter social spambots
SMART_READER_LITE
LIVE PREVIEW

The rise of novel Twitter social spambots SoBigData day @EUI, - - PowerPoint PPT Presentation

The rise of novel Twitter social spambots SoBigData day @EUI, Florence, 11-10-2017 Marinella Petrocchi IIT-CNR, Pisa, Italy SPAMBOTS & SOCIAL NETWORKS spambot AN OPEN PROBLEM Spambots (Semi-)automated accounts with (often) harmful


slide-1
SLIDE 1

SoBigData day @EUI, Florence, 11-10-2017

The rise of novel Twitter social spambots

Marinella Petrocchi

IIT-CNR, Pisa, Italy

slide-2
SLIDE 2

SPAMBOTS & SOCIAL NETWORKS AN OPEN PROBLEM

Spambots spambot (Semi-)automated accounts with (often) harmful intention Misinformation spreading, steal of personal data, manipulation of stock market, infiltration in political discourse

slide-3
SLIDE 3

THE RISE OF THE SOCIAL BOTS

They escape detection techniques, by evolving: On Twitter: fake followers (till 2012) 1st evolution (2012-2014) current (?) wave (2015-2017) New spambots are almost indistinguishable from genuine accounts

  • E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini, “The rise of social bots,” Communica)ons of the ACM, vol. 59, no. 7, pp. 96–104, 2016
slide-4
SLIDE 4

FAKE FOLLOWERS

slide-5
SLIDE 5

NAIVE FAKE ACCOUNTS WERE EASY TO BUY

slide-6
SLIDE 6

SOCIAL SPAMBOTS

The new wave

slide-7
SLIDE 7

Undistinguishable from genuine accounts if analyzed one-by-one Analysis of the online behavior of large groups of users, with the goal of detecting possible spambots among them

SOCIAL SPAMBOTS

slide-8
SLIDE 8

MODELING THE ONLINE BEHAVIOR OF USERS

The idea

Behaviour Sequence of actions performed by an account Digital DNA Each type of action is associated to a character (e.g., A, B, C) The online behaviour of an account is modeled as a sequence of characters (i.e., a string, similarly to biologic DNA) according to the sequence of actions performed by that account

slide-9
SLIDE 9

Encoding T tweet, R retweet, P reply T R R R R P …RRTRPR Timeline of a Twitter account

MODELING THE ONLINE BEHAVIOR OF USERS

The idea

slide-10
SLIDE 10

…RRTRPRTPRRPRTPRPTPRRTRPR …RPRTPTTRPTRPTPRRRRTPPRPP …TTTRRRPPTPRPTPRTRPTRRRTP …PRTRPRTPPPPRTPRRPRTPPRRT …TRTRPRTPRRPRTPRPTPTPPRTT …TRPPRTPPTRPPTPRRTTTPPRPR

DIGITAL DNA VS BIOLOGIC DNA

T tweet, R retweet, P reply …AGTCTCCATTTTCAGGTCGTA …GTTTAAGATCGCCTCATCACC …AGGCAATTCGCCTGAACTGG …AGTCTCGATCCTTTCCTCGTT …AAAATCGAACGCCTTGTCGG …ATTCTCCATCGCCTAAACAAC A adenine, G guanine, T thymine, C cytosine

slide-11
SLIDE 11

…TRRRPRRTRRPRTPRPTPRRTRPR …RPRTPTTRRRPRRTPRRRRTPPRP …TTTRRRPRRRPRRTRTRPTRRRTP …PRTRPRTPPPPRTPRRRRRPRRTR

SIMILARITY BETWEEN DIGITAL DNA SEQUENCES

Intuition Automated accounts (spambots) have similar DNA sequences LCS (longest common substring) Longest substring between N sequences of digital DNA

RRRPRRT

(length: 7 characters)

Spambots characterization

  • M. Arnold and E. Ohlebusch, “Linear Lme algorithms for generalizaLons
  • f the longest common substring problem,” Algorithmica, vol. 60, no. 4,
  • pp. 806–818, 2011
slide-12
SLIDE 12

LCS: SPAMBOTS VS HUMANS

LCS: similarity measure

Spambots characterization

slide-13
SLIDE 13

LCS: SPAMBOTS + HUMANS (MIXED GROUP)

Spambots detection

  • 1. accounts with high

similarity

  • 2. steep decrease in

similarity

  • 3. accounts with low

similarity

slide-14
SLIDE 14

DETECTION TECHNIQUES

Unsupervised approach

Spambots detection

slide-15
SLIDE 15
  • 2. Supervised

approach

DETECTION TECHNIQUES

Spambots detection

slide-16
SLIDE 16

DATASETS

Evaluation datasets: 1. Mixed1 (1982 accounts): 50% Bot1, 50% human 2. Mixed2 (928 accounts): 50% Bot2, 50% human

Spambots detection

slide-17
SLIDE 17

EVALUATION

  • C. Yang, R. Harkreader, and G. Gu, “Empirical evaluaLon and new design for

fighLng evolving TwiVer spammers,” IEEE Transac)ons on Informa)on Forensics and Security, vol. 8, no. 8, pp. 1280–1293, 2013

  • F. Ahmed, and M. Abulaish, “A generic staLsLcal approach for spam detecLon in online social networks,” Computer Communica)ons, vol.

36, no. 10, pp. 1120–1129, 2013

  • Z. Miller, B. Dickinson, W. Deitrick, W. Hu, and A. H. Wang, “TwiVer spammer

detecLon using data stream clustering,” Informa)on Sciences, vol. 260, pp. 64– 73, 2014 Spambots detection

slide-18
SLIDE 18

TAKE-HOME MESSAGES

  • New evolutionary wave: social spambots
  • Current techniques fail in detecting them
  • Detection via digital DNA analysis: effective and efficient (lightweight

features – no graphs – linear complexity algorithms)

Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, Maurizio Tesconi: “The Paradigm Shi? of social spambots: Evidence, theories, and tools for the arms race”, WWW 2017 Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, Maurizio Tesconi: “Social Fingerprin)ng: Detec)on of spambots groups thorugh DNA inspired behavioral modeling” IEEE TransacLons on Dependable and Secure CompuLng, 2017 Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, Maurizio Tesconi: “ExploiLng digital DNA for the analysis of similariLes in TwiVer behaviours” IEEE Data Science and AnalyLcs, 2017

slide-19
SLIDE 19

THANK YOU!

Questions?

Marinella Petrocchi marinella.petrocchi@iit.cnr.it http://mib.projects.iit.cnr.it/dataset.html