The spread of misinformation in social media Filippo Menczer - - PowerPoint PPT Presentation

the spread of misinformation in social media
SMART_READER_LITE
LIVE PREVIEW

The spread of misinformation in social media Filippo Menczer - - PowerPoint PPT Presentation

The spread of misinformation in social media Filippo Menczer Center for Complex Networks and Systems Research School of Informatics and Computing Indiana University, Bloomington cnets.indiana.edu 1. Detection of misinformation 2. How


slide-1
SLIDE 1

The spread of misinformation in social media

Filippo Menczer

Center for Complex Networks and Systems Research School of Informatics and Computing Indiana University, Bloomington

slide-2
SLIDE 2

cnets.indiana.edu

slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
  • 1. Detection of misinformation
  • 2. How misinformation spreads
  • 3. Can the spread of

misinformation be mitigated?

slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9

http://osome.iuni.iu.edu/truthy-preview/tools/networks/#?hashtag=IceBucketChallenge&network_type=rm&start_date=8-8-2014&end_date=8-14-2014

slide-10
SLIDE 10

EPJ Data Science 2014

slide-11
SLIDE 11

#snow on 22 Jan 2016

slide-12
SLIDE 12

Social Media Observatory

Stream Sample Data Collection System Long-term Backup NoSQL

  • Distrib. DB

Analytics Middleware Geo Maps Timeline Network Vis Videos Dynamic Vis Algorithm API

slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

politics celebrities spam astroturf

slide-16
SLIDE 16

Astroturf Detection

nodes Number of nodes edges Number of edges mean_k Mean degree mean_s Mean strength mean_w Mean edge weight in largest connected com- ponent max_k(i,o) Maximum (in,out)-degree max_k(i,o)_user User with max. (in,out)-degree max_s(i,o) Maximum (in,out)-strength max_s(i,o)_user User with max. (in,out)-strength std_k(i,o)

  • Std. dev. of (in,out)-degree

std_s(i,o)

  • Std. dev. of (in,out)-strength

skew_k(i,o) Skew of (in,out)-degree dist. skew_s(i,o) Skew of (in,out)-strength dist. mean_cc The mean size of connected components max_cc The size of the largest connected component entry_nodes Number of unique injections num_truthy Number of times ‘truthy’ button was clicked for the meme sentiment scores The six GPOMS sentiment dimensions

Classifier Accuracy AUC

AdaBoost 96.4% 0.99 SVM 95.6% 0.95

ICWSM 2011

slide-17
SLIDE 17
slide-18
SLIDE 18

Real-time analysis and classification Real-time query (Twitter search API) Real-time feature (>1K) extraction ~ About ~ seconds!

truthy.indiana.edu/botornot

slide-19
SLIDE 19

A E C B D

slide-20
SLIDE 20

AUC

0.95

  • Comm. ACM 2016
slide-21
SLIDE 21
slide-22
SLIDE 22

WWW 2016 Developers Day

slide-23
SLIDE 23

DARPA Twitter bot detection challenge

Sentimetrix 50.75 USC 45.00 IU 43.25 IBM 43.00 Boston Fusion 41.75 Georgia Tech 24.00

IEEE Computer, June 2016

slide-24
SLIDE 24

#SB277

slide-25
SLIDE 25
  • 1. Detection of misinformation
  • 2. How misinformation spreads

(Truthy case study)

  • 3. Can the spread of

misinformation be mitigated?

slide-26
SLIDE 26

28 Aug 3 Sep 26 Aug 28 Aug 25 Aug 2014

slide-27
SLIDE 27
slide-28
SLIDE 28

28 Aug 3 Sep 26 Aug 28 Aug 25 Aug 2014 18 Oct 21 Oct 24 Oct 23 Oct 22 Oct

slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31

28 Aug 3 Sep 26 Aug 28 Aug 25 Aug 2014 18 Oct 21 Oct 24 Oct 23 Oct 3 Nov 4 Nov 10 Nov 22 Oct

slide-32
SLIDE 32
slide-33
SLIDE 33

September 2014 1 March 2016

slide-34
SLIDE 34

Competition between hoaxes and fact checking

chemtrails anti-vax

slide-35
SLIDE 35

Hoax vs. fact checking: model

slide-36
SLIDE 36

Segregation Credibility Number of active believers

slide-37
SLIDE 37

Hoaxy

DATABASE Store Monitors URL Tracker (stream api) RSS Parser Scrapy Spider Fetch News Sites Social Networks API Crawler Analysis Dashboard

slide-38
SLIDE 38
  • r
  • a
  • Pr
  • A ≥ a
  • n
  • Pr
  • N ≥ n
  • FFF
  • p
  • Pr
  • P ≥ p
  • FFF
  • ρ
  • Tweets/Retweets

source sites tweets users URLs fake news 71 1,287,769 171,035 96,400 fact checking 6 154,526 78,624 11,183

WWW SNOW 2016

slide-39
SLIDE 39
  • 1. Detection of misinformation
  • 2. How misinformation spreads
  • 3. Can the spread of

misinformation be mitigated?

slide-40
SLIDE 40

computational fact-checking?

slide-41
SLIDE 41

animal

is a is a

cat carnivorous bird

is a is a is a eat

slide-42
SLIDE 42 Edith Bolling Galt Wilson Florence Harding Grace Coolidge Lou Henry Hoover Eleanor Roosevelt Bess Truman Mamie Eisenhower Jacqueline Kennedy Onassis Lady Bird Johnson Pat Nixon Betty Ford Rosalynn Carter Nancy Reagan Barbara Bush Hillary Rodham Clinton Laura Bush Michelle Obama Woodrow Wilson Warren G. Harding Calvin Coolidge Herbert Hoover Franklin D. Roosevelt Harry S. Truman Dwight D. Eisenhower John F. Kennedy Lyndon B. Johnson Richard Nixon Gerald Ford Jimmy Carter Ronald Reagan George H. W. Bush Bill Clinton George W. Bush Barack Obama 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

is the spouse of is the capital of

PLoS ONE 2015

slide-43
SLIDE 43
slide-44
SLIDE 44

Barack Obama Islam Naheed Nenshi Calgary Stephen Harper Canada Association of American Universities Columbia University

a b

Obama Muslim

slide-45
SLIDE 45
slide-46
SLIDE 46

ICWSM 2011

  • Echo chambers
  • Selective exposure
  • Confirmation bias

Does fact-checking work?

slide-47
SLIDE 47

Predicting political alignment

Features Accuracy

Text (TF-IDF) 79% Hashtags 91% Retweet network 95% Tags + Network 95%

SocialCom 2011

slide-48
SLIDE 48

Activity K-core follower network K-core retweet network

EPJ Data Science 2012

slide-49
SLIDE 49
slide-50
SLIDE 50

Homogeneous Exposure

PeerJ CS 2015

Social bubbles

0.0 0.2 0.4 0.6 0.8 1.0

Bh

pinterest google yahoo search google news aol search bing ask twitter wikipedia facebook aol mail tumblr reddit yahoo mail youtube live mail gmail

random walker baseline

email news aggregator search social media wiki

slide-51
SLIDE 51

Competition for attention

Hashtag Popularity # daily retweets [Twitter] User Popularity # followers [Yahoo! Meme]

55M Followers 2B Views

slide-52
SLIDE 52

Can the competition for limited user attention help explain the broad heterogeneity of meme popularity and our vulnerability to misinformation?

slide-53
SLIDE 53

Toy agent-based model

A B C D

Post existing topics (1 - Pn) Post a new topic (Pn) #jobs #justinbieber #ladygaga #apple

!

#jan25 #apple #jobs #justinbieber #apple #jan25 #apple #jobs #jan25 #ladygaga #jan25 #jobs #jan25 #jobs

A B C D

Before After

Follower Post Screen(Pr) Memory ("! Pr) Screen(Pr) Memory ("! Pr)

(Pm) (Pm)

slide-54
SLIDE 54

Toy model predictions: Role of social network

b a

slide-55
SLIDE 55

Toy model predictions: Role of limited attention

b a

Nature Sci. Rep. 2012

slide-56
SLIDE 56
  • Spread among agents with limited attention on social

network is sufficient to explain meme virality

  • Not necessary to invoke more complicated explanations

based on intrinsic meme value or external factors

slide-57
SLIDE 57

m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m9 m3 m9 m5 m9 m7 m6 m1 m6 m3 m6 m5 m6 m7

  • 1-

0.0 0.2 0.4 0.6 0.8 1.0 fitness 1 2 4 8 16 32 64 128 mean popularity

µ=0.1 µ=0.2 µ=0.4 µ=0.6 µ=0.8 µ=1.0

(a)

0.2 0.4 0.6 0.8 1 fitness 10 10

1

10

2

10

3

average popularity

α=1 α=2 α=3 α=4 α=5 α=6 α=7 α=8 α=9 α=10

µ=0.1

DO the best ideas win?

slide-58
SLIDE 58

efficiency vs diversity

Intensity of competition

0.0 0.2 0.4 0.6 0.8 1.0

(a) =0.01 =0.2 =0.9 (b) (c)

slide-59
SLIDE 59

limited attention

0.5 1 1.5 2 2.5 3 H 0.05 0.1 0.15 0.2 0.25 τ

α=1 α=2 α=3 α=4 α=5 α=6 α=7 α=8 α=9 α=10

Diversity Efficiency

slide-60
SLIDE 60
  • Structural, temporal, content, and user features

can be used to detect astroturf and social bots.

  • Social media and traditional media work together

to spread misinformation; gullible people are exposed through gullible connections.

  • Social network structure and limited attention

may amplify our natural biases and make us more vulnerable to misinformation.

slide-61
SLIDE 61

Thanks!

cnets.indiana.edu

Marcella Tambuscio