A Stylometric Inquiry into Hyperpartisan and Fake News Martin - - PowerPoint PPT Presentation

a stylometric inquiry into hyperpartisan and fake news
SMART_READER_LITE
LIVE PREVIEW

A Stylometric Inquiry into Hyperpartisan and Fake News Martin - - PowerPoint PPT Presentation

A Stylometric Inquiry into Hyperpartisan and Fake News Martin Potthast , Johannes Kiesel , Kevin Reinartz , Janek Bevendorff , Benno Stein Leipzig University, Bauhaus-Universitt Weimar webis.de ACL, July 16th, 2018


slide-1
SLIDE 1

A Stylometric Inquiry into Hyperpartisan and Fake News

Martin Potthast∗, Johannes Kiesel†, Kevin Reinartz†, Janek Bevendorff†, Benno Stein†

∗Leipzig University, †Bauhaus-Universität Weimar

webis.de ACL, July 16th, 2018

1 @KieselJohannes

slide-2
SLIDE 2

2 @KieselJohannes

slide-3
SLIDE 3

3 @KieselJohannes

slide-4
SLIDE 4

What are Fake News?

Disinformation displayed as news articles

4 @KieselJohannes

slide-5
SLIDE 5

What are Fake News?

Disinformation displayed as news articles

Image: Claire Wardle, First Draft

5 @KieselJohannes

slide-6
SLIDE 6

What are Fake News?

Disinformation displayed as news articles

Image: Claire Wardle, First Draft

6 @KieselJohannes

slide-7
SLIDE 7

A Stylometric Inquiry into Hyperpartisan “News” and “News” in False Context and/or with Content that is Impostered, Manipulated, and/or Fabricated

Martin Potthast∗, Johannes Kiesel†, Kevin Reinartz†, Janek Bevendorff†, Benno Stein†

∗Leipzig University, †Bauhaus-Universität Weimar

webis.de ACL, July 16th, 2018

7 @KieselJohannes

slide-8
SLIDE 8

The Political Spectrum

The left-right political spectrum is a system of classifying political positions, ideologies and parties. Left-wing politics and right-wing politics are often presented as opposed, although either may adopt stances from the other side. [Wikipedia]

Left Right Center Alt-left Alt-right

8 @KieselJohannes

slide-9
SLIDE 9

The Political Spectrum

The left-right political spectrum is a system of classifying political positions, ideologies and parties. Left-wing politics and right-wing politics are often presented as opposed, although either may adopt stances from the other side. [Wikipedia]

Left Right Center Alt-left Alt-right Liberal Conservative

9 @KieselJohannes

slide-10
SLIDE 10

The Political Spectrum

The left-right political spectrum is a system of classifying political positions, ideologies and parties. Left-wing politics and right-wing politics are often presented as opposed, although either may adopt stances from the other side. [Wikipedia]

Left Right Center Alt-left Alt-right Liberal Conservative Partisan Partisan Hyperpartisan Hyperpartisan

Partisan: someone with a psychological identification with one major party. [Wikipedia]

10 @KieselJohannes

slide-11
SLIDE 11

The Political Spectrum

The left-right political spectrum is a system of classifying political positions, ideologies and parties. Left-wing politics and right-wing politics are often presented as opposed, although either may adopt stances from the other side. [Wikipedia]

Left Right Center Alt-left Alt-right Liberal Conservative Partisan Partisan Hyperpartisan Hyperpartisan

Partisan: someone with a psychological identification with one major party. [Wikipedia] News media reporting on politics can be aligned on this spectrum as well. We are observing an increasing number of hyperpartisan news publishers.

11 @KieselJohannes

slide-12
SLIDE 12

Fake News and Hyperpartisan News

12 @KieselJohannes

slide-13
SLIDE 13

Why are Fake News Published by Hyperpartisan Pages?

Image: Claire Wardle, First Draft

13 @KieselJohannes

slide-14
SLIDE 14

Why are Fake News Published by Hyperpartisan Pages?

Image: Claire Wardle, First Draft

14 @KieselJohannes

slide-15
SLIDE 15

Fake News Detection

Knowledge-based

❑ Requires political knowledge base ❑ Unavailable ahead of time ❑ We cannot trust the web

Context-based

❑ Limited to social media platforms ❑ Part of damage already done

Style-based

❑ Allows for pre-posting check ❑ Real-time reaction possible ❑ Hard to mask ❑ But are style differences sufficient?

Taxonomy of Approaches

Knowledge-based (also called fact checking) Style-based Information retrieval Semantic web / LOD Text categorization Deception detection Context-based Social network analysis Fake news detection Long et al., 2017 Mocanu et al., 2015 Acemoglu et al., 2010 Kwon et al., 2013 Ma et al., 2017 Volkova et al., 2017 Budak et al., 2011 Nguyen et al. 2012 Derczynski et al., 2017 Tambuscio et al., 2015 Afroz et al., 2012 Badaskar et al., 2008 Rubin et al., 2016 Yang et al., 2017 Rashkin et al., 2017 Horne and Adali, 2017 Pérez-Rosas et al., 2017 Wei et al., 2013 Chen et al., 2015 Rubin et al., 2015 Wang et al., 2017 Bourgonje et al., 2017 Wu et al., 2014 Ciampaglia et al, 2015 Shi and Weninger, 2016 Etzioni et al., 2018 Magdy and Wanas, 2010 Ginsca et al., 2015

15 @KieselJohannes

slide-16
SLIDE 16

Fake News and Hyperpartisan News

Corpus Construction

16 @KieselJohannes

slide-17
SLIDE 17

Fake News and Hyperpartisan News

Corpus Construction

Orientation Fact-checking results Publisher true mix false n/a Σ Center 806 8 12 826 ABC News 90 2 3 95 CNN 295 4 8 307 Politico 421 2 1 424 Left-wing 182 51 15 8 256 Addicting Info 95 25 8 7 135 Occupy Democrats 59 25 7 91 The Other 98% 28 1 1 30 Right-wing 276 153 72 44 545 Eagle Rising 106 47 25 36 214 Freedom Daily 49 24 22 4 99 Right Wing News 121 82 25 4 232 Σ 1264 212 87 64 1627

Annotations provided by journalists at BuzzFeed

17 @KieselJohannes

slide-18
SLIDE 18

Fake News and Hyperpartisan News

Selected Results

Orientation Fact-checking results Publisher true mix false n/a Σ Center 806 8 12 826 ABC News 90 2 3 95 CNN 295 4 8 307 Politico 421 2 1 424 Left-wing 182 51 15 8 256 Addicting Info 95 25 8 7 135 Occupy Democrats 59 25 7 91 The Other 98% 28 1 1 30 Right-wing 276 153 72 44 545 Eagle Rising 106 47 25 36 214 Freedom Daily 49 24 22 4 99 Right Wing News 121 82 25 4 232 Σ 1264 212 87 64 1627

Annotations provided by journalists at BuzzFeed

Fake News Detection Precision ≈ 42% Recall ≈ 41%

18 @KieselJohannes

slide-19
SLIDE 19

Fake News and Hyperpartisan News

Selected Results

Orientation Fact-checking results Publisher true mix false n/a Σ Center 806 8 12 826 ABC News 90 2 3 95 CNN 295 4 8 307 Politico 421 2 1 424 Left-wing 182 51 15 8 256 Addicting Info 95 25 8 7 135 Occupy Democrats 59 25 7 91 The Other 98% 28 1 1 30 Right-wing 276 153 72 44 545 Eagle Rising 106 47 25 36 214 Freedom Daily 49 24 22 4 99 Right Wing News 121 82 25 4 232 Σ 1264 212 87 64 1627

Annotations provided by journalists at BuzzFeed

Orientation Detection Precision ≈ 21% Precision ≈ 56% Recall ≈ 20% Recall ≈ 59%

19 @KieselJohannes

slide-20
SLIDE 20

Fake News and Hyperpartisan News

Selected Results

Orientation Fact-checking results Publisher true mix false n/a Σ Center 806 8 12 826 ABC News 90 2 3 95 CNN 295 4 8 307 Politico 421 2 1 424 Left-wing 182 51 15 8 256 Addicting Info 95 25 8 7 135 Occupy Democrats 59 25 7 91 The Other 98% 28 1 1 30 Right-wing 276 153 72 44 545 Eagle Rising 106 47 25 36 214 Freedom Daily 49 24 22 4 99 Right Wing News 121 82 25 4 232 Σ 1264 212 87 64 1627

Annotations provided by journalists at BuzzFeed

Hyperpartisanship Detection Precision ≈ 69% Recall ≈ 89%

20 @KieselJohannes

slide-21
SLIDE 21

Fake News and Hyperpartisan News

How can it be that the alt left and the alt right cannot be distinguished from the mainstream, when both together (hyperpartisan news) can be?

Left Right Center Alt-left Alt-right Liberal Conservative Partisan Partisan Hyperpartisan Hyperpartisan

21 @KieselJohannes

slide-22
SLIDE 22

Fake News and Hyperpartisan News

How can it be that the alt left and the alt right cannot be distinguished from the mainstream, when both together (hyperpartisan news) can be?

Left Right Center Alt-left Alt-right Partisan Hyperpartisan

22 @KieselJohannes

slide-23
SLIDE 23

Fake News and Hyperpartisan News

How can it be that the alt left and the alt right cannot be distinguished from the mainstream, when both together (hyperpartisan news) can be?

Left Right Center Alt-left Alt-right Partisan Hyperpartisan

The horseshoe theory asserts that the alt left and the alt right, rather than being at

  • pposite and opposing ends of a linear political continuum, in fact closely resemble
  • ne another, much like the ends of a horseshoe. [Wikipedia]

23 @KieselJohannes

slide-24
SLIDE 24

Horseshoe Validation Experiment I

Leave-out Classification

left-wing center right-wing

24 @KieselJohannes

slide-25
SLIDE 25

Horseshoe Validation Experiment I

Leave-out Classification

left-wing center right-wing

❑ Classifier is trained to distinguish left-wing and center articles ❑ Right-wing articles are used for testing ❑ Majority of right-wing articles are classified as left-wing rather than center

25 @KieselJohannes

slide-26
SLIDE 26

Horseshoe Validation Experiment I

Leave-out Classification

left-wing center right-wing

74% | 26%

❑ Classifier is trained to distinguish left-wing and center articles ❑ Right-wing articles are used for testing ❑ Majority of right-wing articles are classified as left-wing rather than center

26 @KieselJohannes

slide-27
SLIDE 27

Horseshoe Validation Experiment I

Leave-out Classification

left-wing center right-wing

74% | 26%

❑ Classifier is trained to distinguish left-wing and center articles ❑ Right-wing articles are used for testing ❑ Majority of right-wing articles are classified as left-wing rather than center

left-wing center right-wing

34% | 66%

27 @KieselJohannes

slide-28
SLIDE 28

Horseshoe Validation Experiment II

Unmasking

[Koppel/Schler 2004]

? =

A B

28 @KieselJohannes

slide-29
SLIDE 29

Horseshoe Validation Experiment II

Unmasking

[Koppel/Schler 2004]

? =

A B A B

29 @KieselJohannes

slide-30
SLIDE 30

Horseshoe Validation Experiment II

Unmasking

[Koppel/Schler 2004]

? =

A B A B A B

0.3 0.2 0.0 0.1 0.2 0.0 0.3 0.1 0.0 0.1 0.4 0.5 0.2 0.1 0.0 0.2 0.1 0.2 0.3 0.6 0.1 0.2 0.3 0.2 0.1 0.3 0.2 0.1 0.4 0.1 0.1 0.2 0.4 0.0 0.2 0.0 0.3 0.1 0.1 0.2 0.4 0.1 0.2 0.3 0.2 0.1 0.4 0.1 0.2 0.1 0.3 0.4 0.1 0.2 0.6 0.2 0.3 0.1 0.2 0.0 0.0 0.1 0.2 0.2 0.3 0.6 0.1 0.1 0.2 0.1 0.5 0.5 0.0 0.2 0.2 0.3 0.4 0.1 0.5 0.2 0.2 0.2 0.1 0.0 0.6 0.2 0.5 0.2 0.3 0.0 0.2 0.3 0.3 0.1 0.2 0.0 0.3 0.2 0.1 0.2 0.1 0.2 0.3 0.1 0.1 0.5 0.1 0.0 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.4 0.2 0.1 0.0 0.0

30 @KieselJohannes

slide-31
SLIDE 31

Horseshoe Validation Experiment II

Unmasking

[Koppel/Schler 2004]

? =

A B A B A B

0.3 0.2 0.0 0.1 0.2 0.0 0.3 0.1 0.0 0.1 0.4 0.5 0.2 0.1 0.0 0.2 0.1 0.2 0.3 0.6 0.1 0.2 0.3 0.2 0.1 0.3 0.2 0.1 0.4 0.1 0.1 0.2 0.4 0.0 0.2 0.0 0.3 0.1 0.1 0.2 0.4 0.1 0.2 0.3 0.2 0.1 0.4 0.1 0.2 0.1 0.3 0.4 0.1 0.2 0.6 0.2 0.3 0.1 0.2 0.0 0.0 0.1 0.2 0.2 0.3 0.6 0.1 0.1 0.2 0.1 0.5 0.5 0.0 0.2 0.2 0.3 0.4 0.1 0.5 0.2 0.2 0.2 0.1 0.0 0.6 0.2 0.5 0.2 0.3 0.0 0.2 0.3 0.3 0.1 0.2 0.0 0.3 0.2 0.1 0.2 0.1 0.2 0.3 0.1 0.1 0.5 0.1 0.0 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.4 0.2 0.1 0.0 0.0 70 50 60 80 90 100 6 12 30 24 18 a a a a a b b b b b

31 @KieselJohannes

slide-32
SLIDE 32

Horseshoe Validation Experiment II

Unmasking

[Koppel/Schler 2004]

? =

A B A B A B

0.3 0.2 0.0 0.1 0.2 0.0 0.3 0.1 0.0 0.1 0.4 0.5 0.2 0.1 0.0 0.2 0.1 0.2 0.3 0.6 0.1 0.2 0.3 0.2 0.1 0.3 0.2 0.1 0.4 0.1 0.1 0.2 0.4 0.0 0.2 0.0 0.3 0.1 0.1 0.2 0.4 0.1 0.2 0.3 0.2 0.1 0.4 0.1 0.2 0.1 0.3 0.4 0.1 0.2 0.6 0.2 0.3 0.1 0.2 0.0 0.0 0.1 0.2 0.2 0.3 0.6 0.1 0.1 0.2 0.1 0.5 0.5 0.0 0.2 0.2 0.3 0.4 0.1 0.5 0.2 0.2 0.2 0.1 0.0 0.6 0.2 0.5 0.2 0.3 0.0 0.2 0.3 0.3 0.1 0.2 0.0 0.3 0.2 0.1 0.2 0.1 0.2 0.3 0.1 0.1 0.5 0.1 0.0 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.4 0.2 0.1 0.0 0.0 70 50 60 80 90 100 6 12 30 24 18 a a a a a b b b b b a a a a a b b b b b

32 @KieselJohannes

slide-33
SLIDE 33

Horseshoe Validation Experiment II

Unmasking

[Koppel/Schler 2004]

? =

A B A B A B

0.3 0.2 0.0 0.1 0.2 0.0 0.3 0.1 0.0 0.1 0.4 0.5 0.2 0.1 0.0 0.2 0.1 0.2 0.3 0.6 0.1 0.2 0.3 0.2 0.1 0.3 0.2 0.1 0.4 0.1 0.1 0.2 0.4 0.0 0.2 0.0 0.3 0.1 0.1 0.2 0.4 0.1 0.2 0.3 0.2 0.1 0.4 0.1 0.2 0.1 0.3 0.4 0.1 0.2 0.6 0.2 0.3 0.1 0.2 0.0 0.0 0.1 0.2 0.2 0.3 0.6 0.1 0.1 0.2 0.1 0.5 0.5 0.0 0.2 0.2 0.3 0.4 0.1 0.5 0.2 0.2 0.2 0.1 0.0 0.6 0.2 0.5 0.2 0.3 0.0 0.2 0.3 0.3 0.1 0.2 0.0 0.3 0.2 0.1 0.2 0.1 0.2 0.3 0.1 0.1 0.5 0.1 0.0 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.4 0.2 0.1 0.0 0.0 70 50 60 80 90 100 6 12 30 24 18 a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b

33 @KieselJohannes

slide-34
SLIDE 34

Horseshoe Validation Experiment II

Unmasking

[Koppel/Schler 2004]

? =

A B A B A B

0.3 0.2 0.0 0.1 0.2 0.0 0.3 0.1 0.0 0.1 0.4 0.5 0.2 0.1 0.0 0.2 0.1 0.2 0.3 0.6 0.1 0.2 0.3 0.2 0.1 0.3 0.2 0.1 0.4 0.1 0.1 0.2 0.4 0.0 0.2 0.0 0.3 0.1 0.1 0.2 0.4 0.1 0.2 0.3 0.2 0.1 0.4 0.1 0.2 0.1 0.3 0.4 0.1 0.2 0.6 0.2 0.3 0.1 0.2 0.0 0.0 0.1 0.2 0.2 0.3 0.6 0.1 0.1 0.2 0.1 0.5 0.5 0.0 0.2 0.2 0.3 0.4 0.1 0.5 0.2 0.2 0.2 0.1 0.0 0.6 0.2 0.5 0.2 0.3 0.0 0.2 0.3 0.3 0.1 0.2 0.0 0.3 0.2 0.1 0.2 0.1 0.2 0.3 0.1 0.1 0.5 0.1 0.0 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.4 0.2 0.1 0.0 0.0 70 50 60 80 90 100 6 12 30 24 18 a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b

34 @KieselJohannes

slide-35
SLIDE 35

Horseshoe Validation Experiment II

Unmasking

[Koppel/Schler 2004]

? =

A B A B A B

0.3 0.2 0.0 0.1 0.2 0.0 0.3 0.1 0.0 0.1 0.4 0.5 0.2 0.1 0.0 0.2 0.1 0.2 0.3 0.6 0.1 0.2 0.3 0.2 0.1 0.3 0.2 0.1 0.4 0.1 0.1 0.2 0.4 0.0 0.2 0.0 0.3 0.1 0.1 0.2 0.4 0.1 0.2 0.3 0.2 0.1 0.4 0.1 0.2 0.1 0.3 0.4 0.1 0.2 0.6 0.2 0.3 0.1 0.2 0.0 0.0 0.1 0.2 0.2 0.3 0.6 0.1 0.1 0.2 0.1 0.5 0.5 0.0 0.2 0.2 0.3 0.4 0.1 0.5 0.2 0.2 0.2 0.1 0.0 0.6 0.2 0.5 0.2 0.3 0.0 0.2 0.3 0.3 0.1 0.2 0.0 0.3 0.2 0.1 0.2 0.1 0.2 0.3 0.1 0.1 0.5 0.1 0.0 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.4 0.2 0.1 0.0 0.0 70 50 60 80 90 100 6 12 30 24 18 a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b

35 @KieselJohannes

slide-36
SLIDE 36

Horseshoe Validation Experiment II

Unmasking

[Koppel/Schler 2004]

Typical learning characteristic for . . .

50 60 70 80 90 100 % correct classifications # eliminated features 6 12 30 24 18

different authors (A = B) same author (A = B)

36 @KieselJohannes

slide-37
SLIDE 37

Horseshoe Validation Experiment II

Unmasking

[Koppel/Schler 2004]

Typical learning characteristic for . . .

50 60 70 80 90 100 % correct classifications # eliminated features 6 12 30 24 18

different authors (A = B) same author (A = B)

37 @KieselJohannes

slide-38
SLIDE 38

Horseshoe Validation Experiment II

Unmasking

[Koppel/Schler 2004]

Typical learning characteristic for . . .

50 60 70 80 90 100 % correct classifications # eliminated features 6 12 30 24 18 Decision: "same" Decision: "different"

different authors (A = B) same author (A = B)

The typical learning characteristic can be learned. ➜ “Meta Learning” We apply Unmasking to distinguish style genres.

38 @KieselJohannes

slide-39
SLIDE 39

Horseshoe Validation Experiment II

Unmasking

[Koppel/Schler 2004]

left vs right mainstream vs left mainstream vs right 0.0 0.2 0.4 0.6 Nomralized accuracy 3 6 9 12 15 Iterations

39 @KieselJohannes

slide-40
SLIDE 40

Summary and Outlook

❑ Hyperpartisan news pages produce relatively many fake news articles ❑ Hyperpartisan news can be distinguished quiet well based on style ❑ Style-based detection allows for real-time detection

➜ Political extremism in news can be ousted or at least flagged

❑ The style of alt left and alt right news is very similar ❑ Linguistic evidence for the horseshoe theory of the political spectrum?

➜ Large-scale analysis required

40 @KieselJohannes

slide-41
SLIDE 41

41 @KieselJohannes

slide-42
SLIDE 42

webis.de/events/semeval-19

42 @KieselJohannes

slide-43
SLIDE 43

Style Model

Features

❑ n-grams with n ∈ [1, 3] of characters, stop words, parts-of-speech ❑ 10 readability scores ❑ Dictionary features based on General Inquirer ❑ Ratios of quoted words, external links, number of paragraphs, and their

average length Feature selection

❑ Discard word features (n-gram features) occurring in less than 2.5% (10%) of

documents Training set

❑ Balancing using oversampling ❑ Publishers are not represented in both training and test set

Learning algorithm

❑ WEKA’s random forest with default parameters

43 @KieselJohannes