The spread of misinformation in social media
Filippo Menczer
Center for Complex Networks and Systems Research School of Informatics and Computing Indiana University, Bloomington
The spread of misinformation in social media Filippo Menczer - - PowerPoint PPT Presentation
The spread of misinformation in social media Filippo Menczer Center for Complex Networks and Systems Research School of Informatics and Computing Indiana University, Bloomington cnets.indiana.edu 1. Detection of misinformation 2. How
Center for Complex Networks and Systems Research School of Informatics and Computing Indiana University, Bloomington
http://osome.iuni.iu.edu/truthy-preview/tools/networks/#?hashtag=IceBucketChallenge&network_type=rm&start_date=8-8-2014&end_date=8-14-2014
EPJ Data Science 2014
#snow on 22 Jan 2016
Stream Sample Data Collection System Long-term Backup NoSQL
Analytics Middleware Geo Maps Timeline Network Vis Videos Dynamic Vis Algorithm API
politics celebrities spam astroturf
nodes Number of nodes edges Number of edges mean_k Mean degree mean_s Mean strength mean_w Mean edge weight in largest connected com- ponent max_k(i,o) Maximum (in,out)-degree max_k(i,o)_user User with max. (in,out)-degree max_s(i,o) Maximum (in,out)-strength max_s(i,o)_user User with max. (in,out)-strength std_k(i,o)
std_s(i,o)
skew_k(i,o) Skew of (in,out)-degree dist. skew_s(i,o) Skew of (in,out)-strength dist. mean_cc The mean size of connected components max_cc The size of the largest connected component entry_nodes Number of unique injections num_truthy Number of times ‘truthy’ button was clicked for the meme sentiment scores The six GPOMS sentiment dimensions
Classifier Accuracy AUC
AdaBoost 96.4% 0.99 SVM 95.6% 0.95
ICWSM 2011
Real-time analysis and classification Real-time query (Twitter search API) Real-time feature (>1K) extraction ~ About ~ seconds!
A E C B D
AUC
WWW 2016 Developers Day
Sentimetrix 50.75 USC 45.00 IU 43.25 IBM 43.00 Boston Fusion 41.75 Georgia Tech 24.00
IEEE Computer, June 2016
28 Aug 3 Sep 26 Aug 28 Aug 25 Aug 2014
28 Aug 3 Sep 26 Aug 28 Aug 25 Aug 2014 18 Oct 21 Oct 24 Oct 23 Oct 22 Oct
28 Aug 3 Sep 26 Aug 28 Aug 25 Aug 2014 18 Oct 21 Oct 24 Oct 23 Oct 3 Nov 4 Nov 10 Nov 22 Oct
September 2014 1 March 2016
chemtrails anti-vax
Segregation Credibility Number of active believers
DATABASE Store Monitors URL Tracker (stream api) RSS Parser Scrapy Spider Fetch News Sites Social Networks API Crawler Analysis Dashboard
source sites tweets users URLs fake news 71 1,287,769 171,035 96,400 fact checking 6 154,526 78,624 11,183
WWW SNOW 2016
animal
is a is a
cat carnivorous bird
is a is a is a eat
is the spouse of is the capital of
PLoS ONE 2015
Barack Obama Islam Naheed Nenshi Calgary Stephen Harper Canada Association of American Universities Columbia University
Obama Muslim
ICWSM 2011
Features Accuracy
Text (TF-IDF) 79% Hashtags 91% Retweet network 95% Tags + Network 95%
SocialCom 2011
Activity K-core follower network K-core retweet network
EPJ Data Science 2012
PeerJ CS 2015
0.0 0.2 0.4 0.6 0.8 1.0
Bh
pinterest google yahoo search google news aol search bing ask twitter wikipedia facebook aol mail tumblr reddit yahoo mail youtube live mail gmail
random walker baseline
email news aggregator search social media wiki
Hashtag Popularity # daily retweets [Twitter] User Popularity # followers [Yahoo! Meme]
55M Followers 2B Views
Can the competition for limited user attention help explain the broad heterogeneity of meme popularity and our vulnerability to misinformation?
A B C D
Post existing topics (1 - Pn) Post a new topic (Pn) #jobs #justinbieber #ladygaga #apple
!
#jan25 #apple #jobs #justinbieber #apple #jan25 #apple #jobs #jan25 #ladygaga #jan25 #jobs #jan25 #jobs
A B C D
Before After
Follower Post Screen(Pr) Memory ("! Pr) Screen(Pr) Memory ("! Pr)
(Pm) (Pm)
b a
b a
Nature Sci. Rep. 2012
network is sufficient to explain meme virality
based on intrinsic meme value or external factors
m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m9 m3 m9 m5 m9 m7 m6 m1 m6 m3 m6 m5 m6 m7
0.0 0.2 0.4 0.6 0.8 1.0 fitness 1 2 4 8 16 32 64 128 mean popularity
µ=0.1 µ=0.2 µ=0.4 µ=0.6 µ=0.8 µ=1.0
(a)
0.2 0.4 0.6 0.8 1 fitness 10 10
1
10
2
10
3
average popularity
α=1 α=2 α=3 α=4 α=5 α=6 α=7 α=8 α=9 α=10µ=0.1
Intensity of competition
0.0 0.2 0.4 0.6 0.8 1.0
(a) =0.01 =0.2 =0.9 (b) (c)
0.5 1 1.5 2 2.5 3 H 0.05 0.1 0.15 0.2 0.25 τ
α=1 α=2 α=3 α=4 α=5 α=6 α=7 α=8 α=9 α=10
Diversity Efficiency
Marcella Tambuscio