You Wont Believe It: Exploring the Advertising Ecosystem of Fake - - PowerPoint PPT Presentation

you won t believe it
SMART_READER_LITE
LIVE PREVIEW

You Wont Believe It: Exploring the Advertising Ecosystem of Fake - - PowerPoint PPT Presentation

You Wont Believe It: Exploring the Advertising Ecosystem of Fake News Websites Catherine Han Allen Tong CS 261N 1 Misinformation Flavors: Deception 2 Misinformation Flavors: Junk Science 3 Misinformation Flavors: Unreliable Clickbait 4


slide-1
SLIDE 1

You Won’t Believe It:

Catherine Han Allen Tong CS 261N

1

Exploring the Advertising Ecosystem

  • f Fake News Websites
slide-2
SLIDE 2

Misinformation Flavors: Deception

2

slide-3
SLIDE 3

Misinformation Flavors: Junk Science

3

slide-4
SLIDE 4

Misinformation Flavors: Unreliable Clickbait

4

slide-5
SLIDE 5

Misinformation Related Work

  • Detection
  • Fact-checking
  • Prevention
  • Propagation modeling

5

slide-6
SLIDE 6

Online Advertisement Problem Space

6

  • Big revenue stream for

“free” services

○ For news publishers too

  • Impact on user experience

and privacy

○ Annoyance ○ Performance ○ Trackers ○ Fear

slide-7
SLIDE 7

Online Advertisement Problem Space, con’t

  • Targeting algorithms

○ Algorithmic bias

  • RTB ecosystem

7

slide-8
SLIDE 8

Real-Time Bidding (RTB) Simplified

8

Ad Exchange

Publisher Server

AD SLOT

P I I + r e q u e s t u s e r p r

  • f

i l e request for bids Advertiser A Advertiser B Advertiser C Advertiser D User Browser $A $B $C $D winning bid ($A), creative (A) embed A A

slide-9
SLIDE 9

Misinformation and Ad Revenue

9

slide-10
SLIDE 10

Misinformation and Ad Revenue, con’t

10

slide-11
SLIDE 11

Misinformation and Ad Revenue, con’t

11

slide-12
SLIDE 12

Research Problem

  • Understanding the ecosystem of online

ads on fake news sites

○ Identifying third-party mediators ○ Identifying advertisers ○ Categorizing ads (e.g., medical, health, etc.)

  • Comparative analysis with popular benign

news sites

12

slide-13
SLIDE 13

Construction of Site Corpus

  • Fake News

○ Zimdars’ False, Misleading, Clickbait-y and/or Satirical “News” Sources List

■ Domain, “About Us”, source, style, aesthetic, social media analysis

  • “Real” (Benign) News

○ Alexa Top Sites (News category)

13

slide-14
SLIDE 14

foo.co m

rendered HTML for foo.com

Methodology - Collecting Ads

14

foo.com foo_1 foo_2 foo_3 foo_1_1 foo_1_2 foo_1_3 foo_2_1 foo_2_2 foo_2_3 foo_3_1 foo_3_2 foo_3_3

slide-15
SLIDE 15

EasyList Malware Domains UBlock Lists

Methodology - Collecting Ads

15

foo.co m

rendered HTML for foo.com + 12 foo.com links < a h r e f = ” . . . ” > <a href=”...”> <a href= ad URLs

slide-16
SLIDE 16

Methodology - Categorizing Ads

16

ad URLs URL URL URL track URL redirects final ad landing pages LDA dictionary of ad words ad topics

slide-17
SLIDE 17

Methodology - Challenges & Lessons Learned

  • Choosing the corpus of fake news sites
  • Web crawling woes

○ Headless browsing detection ○ Dynamically loaded content (ads) ○ Asynchronicity in JS

  • Categorization of websites

○ Amazon Web Information Service API

17

slide-18
SLIDE 18

Limitations & Ethics

  • Inability to account for fingerprinting
  • Corpus size and robustness
  • Gathering (consistent) corpus metadata
  • Clicking on ads

18

slide-19
SLIDE 19

Results - Third-Party Ad Servers

19 … (truncated)

  • Third-party ads were found on 246 fake news sites and 716 benign sites

… (truncated)

slide-20
SLIDE 20

Results - Site Ranking and Third-Party Ad Servers

20

slide-21
SLIDE 21

Results - Advertisers

21

Misinfo Benign

slide-22
SLIDE 22

Results - Site Ranking and Advertisers

22

slide-23
SLIDE 23

Results - Products Advertised

23

Advertisement Category Unique Fake News Sites (N=246) Unique Benign News Sites (N=716) Technology 235 (96%) 403 (56%) Medicine & Health 133 (57%) 383 (53%) Coronavirus

(proper subset of Medicine & Health)

36 (15%) 151 (21%) Finance 84 (34%) 329 (46%) Politics 24 (10%) 40 (6%) Cannabis and CBD 23 (10%) 46 (6%)

slide-24
SLIDE 24

Future Work

  • Larger corpus
  • Categorization of ads
  • Longitudinal crawling
  • Motivations of users navigating

to such sites

○ Browser extension + user study with MTurk

24