DeClarE Debunking Fake News and False Claims using Evidence-Aware - - PowerPoint PPT Presentation

declare
SMART_READER_LITE
LIVE PREVIEW

DeClarE Debunking Fake News and False Claims using Evidence-Aware - - PowerPoint PPT Presentation

DeClarE Debunking Fake News and False Claims using Evidence-Aware Deep Learning Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, Gerhard Weikum EMNLP-2018 M OTIVATION Rapid spread of misinformation online" one of the


slide-1
SLIDE 1

“DeClarE” Debunking Fake News and False Claims using Evidence-Aware Deep Learning

EMNLP-2018 Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, Gerhard Weikum

slide-2
SLIDE 2

MOTIVATION

 “Rapid spread of misinformation online" – one of the top 10

challenges as per The World Economic Forum

 Many truth-checking websites manually verify/falsify claims

2

1 https://www.aljazeera.com/news/2018/10/bolsonaro-continues-lead-polls-fake-news-scandal-181019220347524.html 2 https://www.nytimes.com/2018/10/17/opinion/deep-fake-technology-democracy.html

slide-3
SLIDE 3

RELATED WORK & LIMITATIONS

 Truth Finding

 Conflict resolution amongst

multi-source structured data

 Joint inference of source

reliability and truth

 Communities & Social Media

 Probabilistic graphical models  Social Network analysis

 Natural Language Claims

 Supervised approaches

3

Limited to structured data Focused only on communities Community specific features No external evidence Substantial feature modeling No linguistic cues

slide-4
SLIDE 4

PROBLEM STATEMENT

 Assess the credibility (true/false) of textual claims

 Presents interpretable evidence supporting the assessment

4

T extual Claim DeClarE*

False True Evidence World Wide Web

*DeClare: Debunking Claims with Interpretable Evidence

slide-5
SLIDE 5

TEXTUAL CLAIMS

5

Tucker Carlson: “Far more children died last year drowning in their bathtubs than were killed accidentally by guns." conservativeflashnews.com: “President Obama ordered a life-sized bronze statue of himself to be permanently installed at the White House’’ Coca-Cola’s original diet cola drink, TaB, took its name from an acronym for “totally artificial beverage”.

slide-6
SLIDE 6

EVIDENCE

6

conservativeflashnews.com:

“President Obama ordered a life-sized bronze statue of himself to be permanently installed at the White House’’

ABC News: An article making the rounds on Facebook falsely says that a bronze statue of former President Barack Obama will soon be in the entryway of the White House. But you won't be seeing it any time soon -- or any time at all. The story is false. The Florida Times:The emails have made their way across the internet. But reports that Obama ordered a $200,000 life-size bronze statue of himself to be “permanently installed in the White House” are totally false.

slide-7
SLIDE 7

OUTLINE

 Motivation  Problem Statement

 Key Contributors  Network Architecture & Approach  Experiments & Results

 Conclusion

7

slide-8
SLIDE 8

KEY CONTRIBUTORS

8

 Evidence – Search Engine  Language style and semantics of evidence –

biLSTM

 Interaction between claim and external evidence

– Attention Mechanism

 Trustworthiness of underlying sources – Claim

and Evidence Source Embeddings

slide-9
SLIDE 9

INPUT REPRESENTATIONS

9

 Claim and article: sequences of word embeddings  Claim source and article source: source embeddings

𝑑𝑚 ∈ 𝑆𝑒𝑥 [𝑏𝑙] ∈ 𝑆𝑒𝑥 𝑑𝑡 ∈ 𝑆𝑒𝑡 𝑏𝑡 ∈ 𝑆𝑒𝑡

slide-10
SLIDE 10

ARTICLE REPRESENTATION

10

 Language style aware article representation

 A biLSTM – hidden state output for each word in the evidence

[ℎ𝑙] [𝑏𝑙]

slide-11
SLIDE 11

CLAIM SPECIFIC ATTENTION (1/2)

11

 Importance of each word in the article text w.r.t. the claim  Overall claim representation:

𝑑 =

1 𝑚 𝑚 𝑑𝑚

 Append

𝑑 with each article term: 𝑏𝑙 = 𝑏𝑙⨁ 𝑑

[𝑏𝑙] [ 𝑏𝑙] 𝑑𝑚 [ℎ𝑙]

slide-12
SLIDE 12

CLAIM SPECIFIC ATTENTION (2/2)

12

 Claim specific attention weights:

[ℎ𝑙] [𝑏𝑙] 𝑑𝑚 [ 𝑏𝑙] [𝛽𝑙]

slide-13
SLIDE 13

ATTENTION FOCUSED ARTICLE REPRESENTATION

13

 Attention focused article representation

[ℎ𝑙] [𝑏𝑙] 𝑑𝑚 [ 𝑏𝑙] [𝛽𝑙]

slide-14
SLIDE 14

CREDIBILITY SCORE

14

 Per-article credibility score

slide-15
SLIDE 15

CREDIBILITY SCORE

15

 Per-article credibility score

slide-16
SLIDE 16

EXPERIMENTS

16

 Case Studies

 Snopes (SN) – classification (~4300 claims)  PolitiFact (PF) – classification (~3500 claims)  NewsTrust (NT) – regression (~5344 news headlines)  SemEval-2017 Task (SE) – classification (~250 tweets)

 Analysis

 Source embeddings  Attention weights

slide-17
SLIDE 17

EXPERIMENTAL SETUP

17

 Evaluation:

 10% of the data for parameter tuning  10-fold cross-validation on 90% of the data

 Keras with tensorflow backend

slide-18
SLIDE 18

CASE STUDY: SNOPES & POLITIFACT

18

 Snopes (~4300 claims)

 Verifies Internet rumors,

hoaxes, and other claims

 PolitiFact (~3500 claims)

 Verifies political claims made

by politicians in USA

 Extracted ~30 top search

results as evidence

“The user of solar panels drains the sun

  • f energy.’’

“Entering your PIN in reverse at any ATM will automatically summon the police” Hillary Clinton: "The gun epidemic is the leading cause of death of young African- American men, more than the next nine causes put together."

slide-19
SLIDE 19

EVALUATION

19

 Baselines

 LSTM-T

ext (Rashkin et al., 2017) – no usage of evidence

 CNN-T

ext (Wang, 2017) – no usage of evidence

 DistantSup (Popat et al., 2017)  DeClarE – Our Approach

 Performance measures

 per-class accuracies, macro F1, AUC

slide-20
SLIDE 20

RESULTS: SNOPES & POLITIFACT

20

Dataset Configuration Macro-F1 AUC Snopes LSTM-T ext 0.66 0.70 CNN-T ext 0.66 0.72 DistantSup 0.82 0.88 DeClarE 0.79 0.86 Politifact LSTM-T ext 0.63 0.66 CNN-T ext 0.64 0.67 DistantSup 0.62 0.68 DeClarE 0.68 0.75

slide-21
SLIDE 21

CASE STUDY: NEWSTRUST

21

 News review community – members review news

articles

 Each story: article, article source, user reviews and

ratings (scale 1 to 5)

 Title of the article – claim  Article source – claim source  User reviews – evidence  User ids – evidence sources

 Regression task – predict the credibility score

slide-22
SLIDE 22

RESULTS: NEWSTRUST

22

 Additional baseline:

 CCRF+SVR (Mukherjee and Weikum, 2015)

 Performance measure – Mean Square Error (MSE)

Configuration MSE CNN-T ext 0.53 CCRF+SVR 0.36 LSTM-T ext 0.35 DistantSup 0.35 DeClarE 0.29

slide-23
SLIDE 23

ANALYZING ARTICLE SOURCE EMBEDDINGS

23

Authentic Sources Fake Sources

slide-24
SLIDE 24

ANALYZING CLAIM SOURCE EMBEDDINGS

24

Republicans Democrats

slide-25
SLIDE 25

ANALYZING ATTENTION WEIGHTS

25

slide-26
SLIDE 26

CONCLUSION

 Proposed an end-to-end neural network model

 No feature modeling  Provide interpretable evidence

 Experiments on real-world claims demonstrate

effectiveness of our approach

 Considering external evidence helps!  Datasets: https://www.mpi-inf.mpg.de/dl-cred-analysis/

26

slide-27
SLIDE 27

27

Thank You!