you won t believe it
play

You Wont Believe It: Exploring the Advertising Ecosystem of Fake - PowerPoint PPT Presentation

You Wont Believe It: Exploring the Advertising Ecosystem of Fake News Websites Catherine Han Allen Tong CS 261N 1 Misinformation Flavors: Deception 2 Misinformation Flavors: Junk Science 3 Misinformation Flavors: Unreliable Clickbait 4


  1. You Won’t Believe It: Exploring the Advertising Ecosystem of Fake News Websites Catherine Han Allen Tong CS 261N 1

  2. Misinformation Flavors: Deception 2

  3. Misinformation Flavors: Junk Science 3

  4. Misinformation Flavors: Unreliable Clickbait 4

  5. Misinformation Related Work ● Detection ● Fact-checking ● Prevention ● Propagation modeling 5

  6. Online Advertisement Problem Space ● Big revenue stream for “free” services ○ For news publishers too ● Impact on user experience and privacy ○ Annoyance ○ Performance ○ Trackers ○ Fear 6

  7. Online Advertisement Problem Space, con’t ● Targeting algorithms ○ Algorithmic bias ● RTB ecosystem 7

  8. Real-Time Bidding (RTB) Simplified P I I + r e q u e s t u s e r p r o f i l e Publisher Ad Exchange Server winning bid ($A), embed A creative (A) request for bids $A $D $C $B AD SLOT A Advertiser A Advertiser B Advertiser C Advertiser D User Browser 8

  9. Misinformation and Ad Revenue 9

  10. Misinformation and Ad Revenue, con’t 10

  11. Misinformation and Ad Revenue, con’t 11

  12. Research Problem ● Understanding the ecosystem of online ads on fake news sites ○ Identifying third-party mediators ○ Identifying advertisers ○ Categorizing ads (e.g., medical, health, etc.) ● Comparative analysis with popular benign news sites 12

  13. Construction of Site Corpus ● Fake News ○ Zimdars’ False, Misleading, Clickbait-y and/or Satirical “News” Sources List ■ Domain, “About Us”, source, style, aesthetic, social media analysis ● “Real” (Benign) News ○ Alexa Top Sites (News category) 13

  14. Methodology - Collecting Ads foo.com foo_3 foo_1 foo_2 foo.co foo_1_1 foo_3_3 foo_2_3 foo_2_1 m rendered HTML foo_3_1 foo_1_3 foo_2_2 for foo.com foo_1_2 foo_3_2 14

  15. Methodology - Collecting Ads < a h r e f = ” . . <a href=”...”> . ” > <a href= foo.co m ad URLs EasyList rendered HTML Malware Domains for foo.com + 12 UBlock Lists foo.com links 15

  16. Methodology - Categorizing Ads final ad landing pages URL URL track URL redirects URL ad URLs LDA ad topics dictionary of ad words 16

  17. Methodology - Challenges & Lessons Learned ● Choosing the corpus of fake news sites ● Web crawling woes ○ Headless browsing detection ○ Dynamically loaded content (ads) ○ Asynchronicity in JS ● Categorization of websites ○ Amazon Web Information Service API 17

  18. Limitations & Ethics ● Inability to account for fingerprinting ● Corpus size and robustness ● Gathering (consistent) corpus metadata ● Clicking on ads 18

  19. Results - Third-Party Ad Servers ● Third-party ads were found on 246 fake news sites and 716 benign sites … … (truncated) (truncated) 19

  20. Results - Site Ranking and Third-Party Ad Servers 20

  21. Results - Advertisers Misinfo Benign 21

  22. Results - Site Ranking and Advertisers 22

  23. Results - Products Advertised Advertisement Category Unique Fake News Sites Unique Benign News Sites (N=246) (N=716) Technology 235 ( 96% ) 403 ( 56% ) Medicine & Health 133 ( 57% ) 383 ( 53% ) Coronavirus 36 ( 15% ) 151 ( 21% ) (proper subset of Medicine & Health) Finance 84 ( 34% ) 329 ( 46% ) Politics 24 ( 10% ) 40 ( 6% ) Cannabis and CBD 23 ( 10% ) 46 ( 6% ) 23

  24. Future Work ● Larger corpus ● Categorization of ads ● Longitudinal crawling ● Motivations of users navigating to such sites ○ Browser extension + user study with MTurk 24

Recommend


More recommend