The Web Centipede: Understanding How Web Communities Influence Each - - PowerPoint PPT Presentation

the web centipede understanding how web communities
SMART_READER_LITE
LIVE PREVIEW

The Web Centipede: Understanding How Web Communities Influence Each - - PowerPoint PPT Presentation

The Web Centipede: Understanding How Web Communities Influence Each Other Through the Lens of Mainstream and Alternative News Sources Savvas Zannettou , Tristan Caulfield, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Michael


slide-1
SLIDE 1

The Web Centipede: Understanding How Web Communities Influence Each Other Through the Lens of Mainstream and Alternative News Sources

Savvas Zannettou, Tristan Caulfield, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Michael Sirivianos, Gianluca Stringhini, Jeremy Blackburn

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

Information ecosystem

slide-7
SLIDE 7

Motivation

slide-8
SLIDE 8

4chan à Twitter

slide-9
SLIDE 9

Reddit à Twitter

slide-10
SLIDE 10

The The Pizz Pizzag agate Co Conspiracy cy Theory

slide-11
SLIDE 11

Pizzagate evolution and spread

Data Provider Theory Generator Theory Incubators & Gateway to mainstream “world” Large-scale Disseminator

slide-12
SLIDE 12

4c 4chan Ba Backgrou

  • und
slide-13
SLIDE 13

4chan basics

  • Anonymous conversations grouped into

threads

  • Original Poster (OP) creates a new thread by

making a post with an image

  • Other users can reply with or without images
  • No likes, shares, favorites, etc.
slide-14
SLIDE 14

4chan boards and moderation

  • Threads are separated into different areas of interests know as boards
  • Areas range from politics to sports
  • Extremely lax moderation by volunteers
  • We focus on the

Politically Incorrect board (/pol/)

slide-15
SLIDE 15

Why do we care about 4chan?

slide-16
SLIDE 16

Re Reddit Ba Backgrou

  • und
slide-17
SLIDE 17

Reddit basics

  • Popular news aggregator
  • “Front page of the Internet”
  • A user can start a new thread by creating a

submission with a URL

  • Other users can reply in a structured way

with or without URLs

  • Users can upvote/downvote submissions

and replies

slide-18
SLIDE 18

Subreddits

  • Thousands of user-created subreddits
  • Interests range from video games to news, and pornography
  • Each subreddit has its own moderation policy
  • We focus on 6 subreddits
  • The_Donald, conspiracy, news, worldnews, politics, and AskReddit
slide-19
SLIDE 19

Why do we care about Reddit?

slide-20
SLIDE 20

Datasets and Analysis

slide-21
SLIDE 21

Datasets

  • Compiled a list of 99 mainstream and alternative news sources

Platform Posts/Comments Alternative URLs Mainstream URLs Twitter 486K 42K 236K Reddit (six selected subreddits) 620K 40K 301K 4chan (/pol/) 90K 9K 40K

slide-22
SLIDE 22

Temporal analysis

  • Studied the appearance of alternative and mainstream URLs within the

platforms

  • Built a sequence of appearance for each URL according to the timestamps
  • Built a graph with the sequences
slide-23
SLIDE 23

foxnews.com

Twitter

forbes.com thehill.com huffingtonpost.com reuters.com

6 subreddits

theguardian.com cnn.com nytimes.com

/pol/

cbc.ca bbc.com

Graph representation of the news ecosystem

Twitter

redflagnews.com

6 subreddits

naturalnews.com veteranstoday.com beforeitsnews.com infowars.com clickhole.com therealstrategy.com activistpost.com

/pol/

dcclothesline.com breitbart.com

slide-24
SLIDE 24

Hawkes processes

  • Consists of K processes
  • Each with a rate of events (i.e., posting of a URL), called background rate
  • An event can cause impulse responses to other processes
  • Increases the rates of other processes for a period of time
  • Enable us to be confident about the number of events caused by

another event on the source process (weight)

  • Reveal causal relationships
slide-25
SLIDE 25

Hawkes processes example

Reddit Twitter /pol/

1 2 3 4 5 6 7

slide-26
SLIDE 26

Hawkes processes for influence estimation

  • Hawkes model with 8 processes
  • One for each platform
  • Distinct model for each URL
  • Fit each model with Gibbs sampling
  • Calculate the percentage of events created because of events happened in

each of the other processes

slide-27
SLIDE 27

Influence Estimation Findings

  • Twitter top influencers for

alternative URLs

  • The_Donald (2.72%)
  • /pol/ (1.96%)
  • Politics (1.1%)
  • Twitter top influencers for

mainstream URLs

  • Politics (4.29%)
  • /pol/ (3.01%)
  • The_Donald (2.97%)
slide-28
SLIDE 28

Conclusions & Future Work

Analyzed how news propagate across Web communities

  • Considered URLs

from 99 mainstream and alternative news sources Provided quantifiable influence between

  • Six subreddits

within Reddit

  • Twitter
  • Politically Incorrect

(/pol/) board of 4chan Future Work

  • Investigate the use
  • f NLP and Image

Recognition to associate events that appear in multiple modalities

slide-29
SLIDE 29

Thank you! Questions??