A Stylometric Inquiry into Hyperpartisan and Fake News
Martin Potthast∗, Johannes Kiesel†, Kevin Reinartz†, Janek Bevendorff†, Benno Stein†
∗Leipzig University, †Bauhaus-Universität Weimar
webis.de ACL, July 16th, 2018
1 @KieselJohannes
A Stylometric Inquiry into Hyperpartisan and Fake News Martin - - PowerPoint PPT Presentation
A Stylometric Inquiry into Hyperpartisan and Fake News Martin Potthast , Johannes Kiesel , Kevin Reinartz , Janek Bevendorff , Benno Stein Leipzig University, Bauhaus-Universitt Weimar webis.de ACL, July 16th, 2018
∗Leipzig University, †Bauhaus-Universität Weimar
1 @KieselJohannes
2 @KieselJohannes
3 @KieselJohannes
4 @KieselJohannes
5 @KieselJohannes
6 @KieselJohannes
∗Leipzig University, †Bauhaus-Universität Weimar
7 @KieselJohannes
8 @KieselJohannes
9 @KieselJohannes
10 @KieselJohannes
11 @KieselJohannes
12 @KieselJohannes
13 @KieselJohannes
14 @KieselJohannes
❑ Requires political knowledge base ❑ Unavailable ahead of time ❑ We cannot trust the web
❑ Limited to social media platforms ❑ Part of damage already done
❑ Allows for pre-posting check ❑ Real-time reaction possible ❑ Hard to mask ❑ But are style differences sufficient?
Knowledge-based (also called fact checking) Style-based Information retrieval Semantic web / LOD Text categorization Deception detection Context-based Social network analysis Fake news detection Long et al., 2017 Mocanu et al., 2015 Acemoglu et al., 2010 Kwon et al., 2013 Ma et al., 2017 Volkova et al., 2017 Budak et al., 2011 Nguyen et al. 2012 Derczynski et al., 2017 Tambuscio et al., 2015 Afroz et al., 2012 Badaskar et al., 2008 Rubin et al., 2016 Yang et al., 2017 Rashkin et al., 2017 Horne and Adali, 2017 Pérez-Rosas et al., 2017 Wei et al., 2013 Chen et al., 2015 Rubin et al., 2015 Wang et al., 2017 Bourgonje et al., 2017 Wu et al., 2014 Ciampaglia et al, 2015 Shi and Weninger, 2016 Etzioni et al., 2018 Magdy and Wanas, 2010 Ginsca et al., 2015
15 @KieselJohannes
16 @KieselJohannes
17 @KieselJohannes
18 @KieselJohannes
19 @KieselJohannes
20 @KieselJohannes
21 @KieselJohannes
22 @KieselJohannes
23 @KieselJohannes
24 @KieselJohannes
❑ Classifier is trained to distinguish left-wing and center articles ❑ Right-wing articles are used for testing ❑ Majority of right-wing articles are classified as left-wing rather than center
25 @KieselJohannes
❑ Classifier is trained to distinguish left-wing and center articles ❑ Right-wing articles are used for testing ❑ Majority of right-wing articles are classified as left-wing rather than center
26 @KieselJohannes
❑ Classifier is trained to distinguish left-wing and center articles ❑ Right-wing articles are used for testing ❑ Majority of right-wing articles are classified as left-wing rather than center
27 @KieselJohannes
[Koppel/Schler 2004]
28 @KieselJohannes
[Koppel/Schler 2004]
29 @KieselJohannes
[Koppel/Schler 2004]
0.3 0.2 0.0 0.1 0.2 0.0 0.3 0.1 0.0 0.1 0.4 0.5 0.2 0.1 0.0 0.2 0.1 0.2 0.3 0.6 0.1 0.2 0.3 0.2 0.1 0.3 0.2 0.1 0.4 0.1 0.1 0.2 0.4 0.0 0.2 0.0 0.3 0.1 0.1 0.2 0.4 0.1 0.2 0.3 0.2 0.1 0.4 0.1 0.2 0.1 0.3 0.4 0.1 0.2 0.6 0.2 0.3 0.1 0.2 0.0 0.0 0.1 0.2 0.2 0.3 0.6 0.1 0.1 0.2 0.1 0.5 0.5 0.0 0.2 0.2 0.3 0.4 0.1 0.5 0.2 0.2 0.2 0.1 0.0 0.6 0.2 0.5 0.2 0.3 0.0 0.2 0.3 0.3 0.1 0.2 0.0 0.3 0.2 0.1 0.2 0.1 0.2 0.3 0.1 0.1 0.5 0.1 0.0 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.4 0.2 0.1 0.0 0.0
30 @KieselJohannes
[Koppel/Schler 2004]
0.3 0.2 0.0 0.1 0.2 0.0 0.3 0.1 0.0 0.1 0.4 0.5 0.2 0.1 0.0 0.2 0.1 0.2 0.3 0.6 0.1 0.2 0.3 0.2 0.1 0.3 0.2 0.1 0.4 0.1 0.1 0.2 0.4 0.0 0.2 0.0 0.3 0.1 0.1 0.2 0.4 0.1 0.2 0.3 0.2 0.1 0.4 0.1 0.2 0.1 0.3 0.4 0.1 0.2 0.6 0.2 0.3 0.1 0.2 0.0 0.0 0.1 0.2 0.2 0.3 0.6 0.1 0.1 0.2 0.1 0.5 0.5 0.0 0.2 0.2 0.3 0.4 0.1 0.5 0.2 0.2 0.2 0.1 0.0 0.6 0.2 0.5 0.2 0.3 0.0 0.2 0.3 0.3 0.1 0.2 0.0 0.3 0.2 0.1 0.2 0.1 0.2 0.3 0.1 0.1 0.5 0.1 0.0 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.4 0.2 0.1 0.0 0.0 70 50 60 80 90 100 6 12 30 24 18 a a a a a b b b b b
31 @KieselJohannes
[Koppel/Schler 2004]
0.3 0.2 0.0 0.1 0.2 0.0 0.3 0.1 0.0 0.1 0.4 0.5 0.2 0.1 0.0 0.2 0.1 0.2 0.3 0.6 0.1 0.2 0.3 0.2 0.1 0.3 0.2 0.1 0.4 0.1 0.1 0.2 0.4 0.0 0.2 0.0 0.3 0.1 0.1 0.2 0.4 0.1 0.2 0.3 0.2 0.1 0.4 0.1 0.2 0.1 0.3 0.4 0.1 0.2 0.6 0.2 0.3 0.1 0.2 0.0 0.0 0.1 0.2 0.2 0.3 0.6 0.1 0.1 0.2 0.1 0.5 0.5 0.0 0.2 0.2 0.3 0.4 0.1 0.5 0.2 0.2 0.2 0.1 0.0 0.6 0.2 0.5 0.2 0.3 0.0 0.2 0.3 0.3 0.1 0.2 0.0 0.3 0.2 0.1 0.2 0.1 0.2 0.3 0.1 0.1 0.5 0.1 0.0 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.4 0.2 0.1 0.0 0.0 70 50 60 80 90 100 6 12 30 24 18 a a a a a b b b b b a a a a a b b b b b
32 @KieselJohannes
[Koppel/Schler 2004]
0.3 0.2 0.0 0.1 0.2 0.0 0.3 0.1 0.0 0.1 0.4 0.5 0.2 0.1 0.0 0.2 0.1 0.2 0.3 0.6 0.1 0.2 0.3 0.2 0.1 0.3 0.2 0.1 0.4 0.1 0.1 0.2 0.4 0.0 0.2 0.0 0.3 0.1 0.1 0.2 0.4 0.1 0.2 0.3 0.2 0.1 0.4 0.1 0.2 0.1 0.3 0.4 0.1 0.2 0.6 0.2 0.3 0.1 0.2 0.0 0.0 0.1 0.2 0.2 0.3 0.6 0.1 0.1 0.2 0.1 0.5 0.5 0.0 0.2 0.2 0.3 0.4 0.1 0.5 0.2 0.2 0.2 0.1 0.0 0.6 0.2 0.5 0.2 0.3 0.0 0.2 0.3 0.3 0.1 0.2 0.0 0.3 0.2 0.1 0.2 0.1 0.2 0.3 0.1 0.1 0.5 0.1 0.0 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.4 0.2 0.1 0.0 0.0 70 50 60 80 90 100 6 12 30 24 18 a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b
33 @KieselJohannes
[Koppel/Schler 2004]
0.3 0.2 0.0 0.1 0.2 0.0 0.3 0.1 0.0 0.1 0.4 0.5 0.2 0.1 0.0 0.2 0.1 0.2 0.3 0.6 0.1 0.2 0.3 0.2 0.1 0.3 0.2 0.1 0.4 0.1 0.1 0.2 0.4 0.0 0.2 0.0 0.3 0.1 0.1 0.2 0.4 0.1 0.2 0.3 0.2 0.1 0.4 0.1 0.2 0.1 0.3 0.4 0.1 0.2 0.6 0.2 0.3 0.1 0.2 0.0 0.0 0.1 0.2 0.2 0.3 0.6 0.1 0.1 0.2 0.1 0.5 0.5 0.0 0.2 0.2 0.3 0.4 0.1 0.5 0.2 0.2 0.2 0.1 0.0 0.6 0.2 0.5 0.2 0.3 0.0 0.2 0.3 0.3 0.1 0.2 0.0 0.3 0.2 0.1 0.2 0.1 0.2 0.3 0.1 0.1 0.5 0.1 0.0 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.4 0.2 0.1 0.0 0.0 70 50 60 80 90 100 6 12 30 24 18 a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b
34 @KieselJohannes
[Koppel/Schler 2004]
0.3 0.2 0.0 0.1 0.2 0.0 0.3 0.1 0.0 0.1 0.4 0.5 0.2 0.1 0.0 0.2 0.1 0.2 0.3 0.6 0.1 0.2 0.3 0.2 0.1 0.3 0.2 0.1 0.4 0.1 0.1 0.2 0.4 0.0 0.2 0.0 0.3 0.1 0.1 0.2 0.4 0.1 0.2 0.3 0.2 0.1 0.4 0.1 0.2 0.1 0.3 0.4 0.1 0.2 0.6 0.2 0.3 0.1 0.2 0.0 0.0 0.1 0.2 0.2 0.3 0.6 0.1 0.1 0.2 0.1 0.5 0.5 0.0 0.2 0.2 0.3 0.4 0.1 0.5 0.2 0.2 0.2 0.1 0.0 0.6 0.2 0.5 0.2 0.3 0.0 0.2 0.3 0.3 0.1 0.2 0.0 0.3 0.2 0.1 0.2 0.1 0.2 0.3 0.1 0.1 0.5 0.1 0.0 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.4 0.2 0.1 0.0 0.0 70 50 60 80 90 100 6 12 30 24 18 a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b a a a a a b b b b b
35 @KieselJohannes
[Koppel/Schler 2004]
50 60 70 80 90 100 % correct classifications # eliminated features 6 12 30 24 18
36 @KieselJohannes
[Koppel/Schler 2004]
50 60 70 80 90 100 % correct classifications # eliminated features 6 12 30 24 18
37 @KieselJohannes
[Koppel/Schler 2004]
50 60 70 80 90 100 % correct classifications # eliminated features 6 12 30 24 18 Decision: "same" Decision: "different"
38 @KieselJohannes
[Koppel/Schler 2004]
39 @KieselJohannes
❑ Hyperpartisan news pages produce relatively many fake news articles ❑ Hyperpartisan news can be distinguished quiet well based on style ❑ Style-based detection allows for real-time detection
❑ The style of alt left and alt right news is very similar ❑ Linguistic evidence for the horseshoe theory of the political spectrum?
40 @KieselJohannes
41 @KieselJohannes
42 @KieselJohannes
❑ n-grams with n ∈ [1, 3] of characters, stop words, parts-of-speech ❑ 10 readability scores ❑ Dictionary features based on General Inquirer ❑ Ratios of quoted words, external links, number of paragraphs, and their
❑ Discard word features (n-gram features) occurring in less than 2.5% (10%) of
❑ Balancing using oversampling ❑ Publishers are not represented in both training and test set
❑ WEKA’s random forest with default parameters
43 @KieselJohannes