the spread of misinformation in social media
play

The spread of misinformation in social media Filippo Menczer - PowerPoint PPT Presentation

The spread of misinformation in social media Filippo Menczer Center for Complex Networks and Systems Research School of Informatics and Computing Indiana University, Bloomington cnets.indiana.edu 1. Detection of misinformation 2. How


  1. The spread of misinformation in social media Filippo Menczer Center for Complex Networks and Systems Research School of Informatics and Computing Indiana University, Bloomington

  2. cnets.indiana.edu

  3. 1. Detection of misinformation 2. How misinformation spreads 3. Can the spread of misinformation be mitigated?

  4. http://osome.iuni.iu.edu/truthy-preview/tools/networks/#?hashtag=IceBucketChallenge&network_type=rm&start_date=8-8-2014&end_date=8-14-2014

  5. EPJ Data Science 2014

  6. #snow on 22 Jan 2016

  7. Social Media Observatory API Timeline Middleware Network Vis Geo Maps NoSQL Analytics Distrib. DB Dynamic Vis Algorithm Data Collection Videos System Long-term Backup Stream Sample

  8. politics celebrities spam astroturf

  9. Astroturf Detection Number of nodes nodes edges Number of edges Mean degree mean_k mean_s Mean strength Mean edge weight in largest connected com- mean_w ponent Classifier Accuracy AUC max_k(i,o) Maximum (in,out)-degree max_k(i,o)_user User with max. (in,out)-degree max_s(i,o) Maximum (in,out)-strength max_s(i,o)_user AdaBoost User with max. (in,out)-strength 96.4% 0.99 std_k(i,o) Std. dev. of (in,out)-degree std_s(i,o) Std. dev. of (in,out)-strength skew_k(i,o) Skew of (in,out)-degree dist. SVM 95.6% 0.95 Skew of (in,out)-strength dist. skew_s(i,o) mean_cc The mean size of connected components The size of the largest connected component max_cc entry_nodes Number of unique injections Number of times ‘truthy’ button was clicked num_truthy for the meme sentiment scores The six GPOMS sentiment dimensions ICWSM 2011

  10. truthy.indiana.edu/botornot About ~ seconds! ~ Real-time query (Twitter search API) Real-time feature (>1K) extraction Real-time analysis and classification

  11. A B C D E

  12. AUC 0.95 Comm. ACM 2016

  13. WWW 2016 Developers Day

  14. DARPA Twitter bot detection challenge Sentimetrix 50.75 USC 45.00 IU 43.25 IBM 43.00 Boston Fusion 41.75 Georgia Tech 24.00 IEEE Computer , June 2016

  15. #SB277

  16. 1. Detection of misinformation 2. How misinformation spreads (Truthy case study) 3. Can the spread of misinformation be mitigated?

  17. 28 Aug 26 Aug 3 Sep 28 Aug 25 Aug 2014

  18. 18 Oct 24 Oct 28 Aug 21 Oct 23 Oct 22 Oct 26 Aug 3 Sep 28 Aug 25 Aug 2014

  19. 10 Nov 18 Oct 4 Nov 3 Nov 24 Oct 28 Aug 21 Oct 23 Oct 22 Oct 26 Aug 3 Sep 28 Aug 25 Aug 2014

  20. September 2014 1 March 2016

  21. Competition between hoaxes and fact checking chemtrails anti-vax

  22. Hoax vs. fact checking: model

  23. Number of active believers Segregation Credibility

  24. Hoaxy Social Networks News Sites API Crawler RSS Parser URL Tracker (stream api) Scrapy Spider Monitors Store DATABASE Fetch Analysis Dashboard

  25. ���� ���� ���� ���� source sites tweets users URLs ���� r fake news 71 1,287,769 171,035 96,400 ���� fact checking 6 154,526 78,624 11,183 ���� ���� ���� ��� ��� � �� �� Tweets/Retweets ��� ����������� ��� �� � �� � �� � ��� ��� �� �� �� �� �� �� � � � � � ��� � �� �� ρ N ≥ n A ≥ a P ≥ p �� �� �� �� �� �� �� �� �� �� ��� � � �� �� � Pr ��������� Pr Pr �� �� �� �� �� �� ��F��F��F���� ��� �� �� �� �� �� �� ��������� ��F��F��F���� �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � p a n WWW SNOW 2016

  26. 1. Detection of misinformation 2. How misinformation spreads 3. Can the spread of misinformation be mitigated?

  27. computational fact-checking?

  28. carnivorous cat bird is a eat is a is a is a is a animal

  29. is the spouse of 0.45 Woodrow Wilson 0.4 Warren G. Harding 0.35 Calvin Coolidge 0.3 0.25 Herbert Hoover 0.2 Franklin D. Roosevelt 0.15 0.1 Harry S. Truman Dwight D. Eisenhower John F. Kennedy Lyndon B. Johnson Richard Nixon Gerald Ford Jimmy Carter Ronald Reagan George H. W. Bush Bill Clinton George W. Bush Barack Obama Edith Bolling Galt Wilson Florence Harding Grace Coolidge Lou Henry Hoover Eleanor Roosevelt Bess Truman Mamie Eisenhower Jacqueline Kennedy Onassis Lady Bird Johnson Pat Nixon Betty Ford Rosalynn Carter Nancy Reagan Barbara Bush Hillary Rodham Clinton Laura Bush Michelle Obama is the capital of PLoS ONE 2015

  30. a b Obama Barack Obama Columbia University Association of American Universities Canada Stephen Harper Calgary Naheed Nenshi Islam Muslim

  31. Does fact-checking work? ‣ Echo chambers ‣ Selective exposure ‣ Con fi rmation bias ICWSM 2011

  32. Predicting political alignment Features Accuracy Text (TF-IDF) 79% Hashtags 91% Retweet network 95% Tags + Network 95% SocialCom 2011

  33. Activity K-core follower network K-core retweet network EPJ Data Science 2012

  34. Homogeneous Exposure gmail live mail youtube Social yahoo mail reddit tumblr bubbles aol mail facebook wikipedia twitter ask bing email aol search news aggregator google news search yahoo search social media google random walker wiki pinterest baseline 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 B h PeerJ CS 2015

  35. Competition for attention 2B Views 55M Followers Hashtag Popularity User Popularity # daily retweets # followers [Twitter] [Yahoo! Meme]

  36. Can the competition for limited user attention help explain the broad heterogeneity of meme popularity and our vulnerability to misinformation?

  37. Toy agent-based model Follower Post Post a new topic (Pn) #jan25 #jan25 C C #jobs #jobs B B A A D D Post existing topics (1 - Pn) Screen (Pr) Memory ( "! Pr) Screen (Pr) Memory ( "! Pr) (P m ) (P m ) #apple #apple #apple #jan25 ! #jobs #jobs #jan25 #jobs #justinbieber #justinbieber #apple #ladygaga #jan25 #ladygaga Before After

  38. Toy model predictions: Role of social network a b

  39. Toy model predictions: Role of limited attention a b Nature Sci. Rep. 2012

  40. • Spread among agents with limited attention on social network is sufficient to explain meme virality • Not necessary to invoke more complicated explanations based on intrinsic meme value or external factors

  41. DO the best ideas win? 128 m 9 µ=0.1 µ=0.2 m 1 64 µ=0.4 µ=0.6 mean popularity µ=0.8 32 µ=1.0 m 9 m 3 16 � m 1 8 m 2 4 m 9 m 7 2 m 3 m 9 m 4 m 5 1 0.0 0.2 0.4 0.6 0.8 1.0 (a) fitness 3 10 m 7 m 6 1- � α =1 m 8 m 1 α =2 α =3 m 5 α =4 α =5 m 6 α =6 average popularity α =7 2 10 α =8 m 6 α =9 m 3 α =10 1 10 m 6 m 7 m 6 m 5 µ=0.1 0 10 0 0.2 0.4 0.6 0.8 1 fitness

  42. efficiency vs diversity 0.0 0.2 0.4 0.6 0.8 1.0 � =0.01 � =0.2 � =0.9 (a) (b) (c) Intensity of competition

  43. limited attention 0.25 0.2 E ffi ciency 0.15 τ α =1 α =2 0.1 α =3 α =4 α =5 α =6 α =7 0.05 α =8 α =9 α =10 0 0 0.5 1 1.5 2 2.5 3 H Diversity

  44. • Structural, temporal, content, and user features can be used to detect astroturf and social bots. • Social media and traditional media work together to spread misinformation; gullible people are exposed through gullible connections. • Social network structure and limited attention may amplify our natural biases and make us more vulnerable to misinformation.

  45. cnets.indiana.edu Marcella Tambuscio Thanks!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend