Recent Advances in Automated Fact Checking Immanuel Trummer - - PowerPoint PPT Presentation

recent advances in automated fact checking
SMART_READER_LITE
LIVE PREVIEW

Recent Advances in Automated Fact Checking Immanuel Trummer - - PowerPoint PPT Presentation

Recent Advances in Automated Fact Checking Immanuel Trummer Cornell University Automation & Fact Checking Automation & Fact Checking Lorem Ipsum ... Identifying Lorem Ipsum ... Check-Worthy Claims


slide-1
SLIDE 1

Recent Advances 
 in Automated Fact Checking

Immanuel Trummer
 Cornell University

slide-2
SLIDE 2

Automation & 
 Fact Checking

slide-3
SLIDE 3

Automation & 
 Fact Checking

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

Claim

Identifying
 Check-Worthy
 Claims

slide-4
SLIDE 4

Automation & 
 Fact Checking

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

Claim Check

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

Identifying
 Check-Worthy
 Claims Matching
 Claims to
 Checks

slide-5
SLIDE 5

Verification

Automation & 
 Fact Checking

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

Claim Check

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

Identifying
 Check-Worthy
 Claims Matching
 Claims to
 Checks Talk Focus

slide-6
SLIDE 6

Data-Driven Fact Checking

Claim Verified/Refuted Data Formula

slide-7
SLIDE 7

Data-Driven Fact Checking

Claim Verified/Refuted Data Formula

(aka. SQL Query)

slide-8
SLIDE 8

Data-Driven Fact Checking

Claim Verified/Refuted Data Formula

slide-9
SLIDE 9

Data-Driven Fact Checking

Claim Verified/Refuted Data Formula

(In Natural Language)

slide-10
SLIDE 10

Data-Driven Fact Checking

Claim Verified/Refuted Data Formula

W h i c h d a t a ?

(In Natural Language)

slide-11
SLIDE 11

Data-Driven Fact Checking

Claim Verified/Refuted Data Formula

W h i c h d a t a ? W h i c h f

  • r

m u l a ?

(In Natural Language)

slide-12
SLIDE 12

International Energy Agency

slide-13
SLIDE 13

International Energy Agency

Paris-based intergovernmental organization Established in 1974, 30 member countries Mission: serve statistics on energy sector

slide-14
SLIDE 14

International Energy Agency

Paris-based intergovernmental organization Established in 1974, 30 member countries Mission: serve statistics on energy sector

slide-15
SLIDE 15
slide-16
SLIDE 16

Claims Marked Up

slide-17
SLIDE 17

Claims Marked Up Hundreds of pages, Thousands of claims ...

slide-18
SLIDE 18

Claims Marked Up Hundreds of pages, Thousands of claims ...

Verification Takes Weeks!

slide-19
SLIDE 19

Fact Checking @ IEA

Claim Verified/Refuted Data Formula

slide-20
SLIDE 20

Fact Checking @ IEA

Claim Verified/Refuted Data Formula

In 2017, global electricity 
 demand grew by 3% ...

slide-21
SLIDE 21

Fact Checking @ IEA

Claim Verified/Refuted Data Formula

In 2017, global electricity 
 demand grew by 3% ... Electricity/Global/2017
 Electricity/Global/2016

slide-22
SLIDE 22

Fact Checking @ IEA

Claim Verified/Refuted Data Formula

In 2017, global electricity 
 demand grew by 3% ... Electricity/Global/2017
 Electricity/Global/2016 D17/D16=1.03

slide-23
SLIDE 23

Fact Checking @ IEA

Claim Verified/Refuted Data Formula

In 2017, global electricity 
 demand grew by 3% ... Electricity/Global/2017
 Electricity/Global/2016 D17/D16=1.03

slide-24
SLIDE 24

Fact Checking @ IEA

Claim Verified/Refuted Data Formula

In 2017, global electricity 
 demand grew by 3% ... Electricity/Global/2017
 Electricity/Global/2016 D17/D16=1.03

slide-25
SLIDE 25

The "Infodemic"

slide-26
SLIDE 26

Fighting the Infodemic

slide-27
SLIDE 27

Fighting the Infodemic

slide-28
SLIDE 28

Fighting the Infodemic

slide-29
SLIDE 29

Fighting the Infodemic

Claim Verified/Refuted Data Formula

France has 
 more cases than US F>U CDC/Confirmed/France CDC/Confirmed/US

slide-30
SLIDE 30

Fighting the Infodemic

Claim Verified/Refuted Data Formula

France has 
 more cases than US F>U CDC/Confirmed/France CDC/Confirmed/US

slide-31
SLIDE 31

Fighting the Infodemic

Claim Verified/Refuted Data Formula

France has 
 more cases than US CDC/Confirmed/France CDC/Confirmed/US F>U

slide-32
SLIDE 32

Other Use Cases

  • Data journalism
  • Business reports
  • Scientific papers
  • ...
slide-33
SLIDE 33

(Demo)

Verifying text summaries of relational data sets. 


SIGMOD 2019


  • S. Jo, I. Trummer, W. Yu, X. Wang, C. Yu, D. Liu, N. Mehta.
slide-34
SLIDE 34

Challenges

slide-35
SLIDE 35

Challenges

Text-Data Inconsistency

... ... American ...

slide-36
SLIDE 36

Challenges

Multi-Claim Sentences Text-Data Inconsistency

... ... American ...

slide-37
SLIDE 37

Challenges

Context Multi-Claim Sentences Text-Data Inconsistency

... ... American ...

slide-38
SLIDE 38

Fully Automated Checking

Claim Translation Formula Evaluation

slide-39
SLIDE 39

Fully Automated Checking

Claim Translation Formula Evaluation May go Wrong!

slide-40
SLIDE 40

Semi-Automated Checking

Claim Translation Formula Evaluation

Probability

slide-41
SLIDE 41

Analyze Data Structure

country beer_servings spirit_servings ... ... ... ... ... Germany 346 117 ... ... ... ... ... USA 249 158 ... ... ... ... ...

slide-42
SLIDE 42

Analyze Data Structure

country beer_servings spirit_servings ... ... ... ... ... Germany 346 117 ... ... ... ... ... USA 249 158 ... ... ... ... ...

slide-43
SLIDE 43

Analyze Data Structure

country beer_servings spirit_servings ... ... ... ... ... Germany 346 117 ... ... ... ... ... USA 249 158 ... ... ... ... ...

Country Germany USA Beer Serving Spirit ...

Keywords

slide-44
SLIDE 44

Analyze Data Structure

country beer_servings spirit_servings ... ... ... ... ... Germany 346 117 ... ... ... ... ... USA 249 158 ... ... ... ... ...

United States America Country Germany USA Beer Serving Spirit ... U.S.

Keywords Synonyms

... ...

slide-45
SLIDE 45

Analyze Data Structure

country beer_servings spirit_servings ... ... ... ... ... Germany 346 117 ... ... ... ... ... USA 249 158 ... ... ... ... ...

United States America Country Germany USA Beer Serving Spirit ... U.S.

Keywords Synonyms

... ...

Matches

slide-46
SLIDE 46

Analyze Sentence Structure

slide-47
SLIDE 47

Analyze Sentence Structure

slide-48
SLIDE 48

Analyze Sentence Structure

slide-49
SLIDE 49

Analyze Sentence Structure

slide-50
SLIDE 50

Consider Text Structure

Claim Sentence

slide-51
SLIDE 51

Paragraph

Consider Text Structure

Claim Sentence

slide-52
SLIDE 52

Section Paragraph

Consider Text Structure

Claim Sentence

slide-53
SLIDE 53

Chapter Section Paragraph

Consider Text Structure

Claim Sentence

slide-54
SLIDE 54

Chapter Section Paragraph

Consider Text Structure

Claim Sentence

Integrate Surrounding Keywords

slide-55
SLIDE 55

Understand the Author

Translate Text

slide-56
SLIDE 56

Understand the Author

Translate Text Infer Topic

Claim
 Translation
 Hypothesis

slide-57
SLIDE 57

Understand the Author

Translate Text Infer Topic

Document
 Topic
 Hypothesis Claim
 Translation
 Hypothesis

slide-58
SLIDE 58

(Demo)

Verifying text summaries of relational data sets. 


SIGMOD 2019


  • S. Jo, I. Trummer, W. Yu, X. Wang, C. Yu, D. Liu, N. Mehta.
slide-59
SLIDE 59

System Overview

Lorem 
 Ipsum ...

Data Analysis Text Analysis Topic Analysis

slide-60
SLIDE 60

Automated Accuracy

Correctness Chance

25 50 75 100

  • Nr. Proposed Formulas

1 2 3 4 5 6 7 8 9 10

slide-61
SLIDE 61

Automated Accuracy

Correctness Chance

25 50 75 100

  • Nr. Proposed Formulas

1 2 3 4 5 6 7 8 9 10

(Billions of possible formulas)

slide-62
SLIDE 62

Automated Accuracy

Correctness Chance

25 50 75 100

  • Nr. Proposed Formulas

1 2 3 4 5 6 7 8 9 10

(Billions of possible formulas) Want Auto-Suggestions

slide-63
SLIDE 63

Automated Accuracy

Correctness Chance

25 50 75 100

  • Nr. Proposed Formulas

1 2 3 4 5 6 7 8 9 10

(Billions of possible formulas) Need Human Feedback Want Auto-Suggestions

slide-64
SLIDE 64

User Study Results

Claims per Minute

0.4 0.8 1.2 1.6

Verification Method With Tool Without Tool

6x Speedup!

slide-65
SLIDE 65

Mistakes Discovered

11% of claims were incorrect 7% average error of claim value

Analyzed 50 articles from major data journalism venues

slide-66
SLIDE 66

Scaling It Up

Lorem 
 Ipsum ...

AggChecker

slide-67
SLIDE 67

Scaling It Up

Lorem 
 Ipsum ...

AggChecker

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

slide-68
SLIDE 68

Scaling It Up

Lorem 
 Ipsum ...

AggChecker

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

slide-69
SLIDE 69

Scaling It Up

Lorem 
 Ipsum ...

AggChecker

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

slide-70
SLIDE 70

Scaling It Up

Lorem 
 Ipsum ...

Scrutinizer

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

slide-71
SLIDE 71

Scaling It Up

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

slide-72
SLIDE 72

Scaling It Up

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

slide-73
SLIDE 73

Scaling It Up

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

Learning Data Mapper

slide-74
SLIDE 74

Scaling It Up

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

A s k ! L e a r n

Learning Data Mapper

slide-75
SLIDE 75

Scaling It Up

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

Learning Data Mapper

slide-76
SLIDE 76

Scaling It Up

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

Learning Data Mapper

slide-77
SLIDE 77

Interface Optimization

slide-78
SLIDE 78

Interface Optimization

Which Question?

slide-79
SLIDE 79

Interface Optimization

Which Question? Which Options?

slide-80
SLIDE 80

Interface Optimization

Which Question? Which Options?

slide-81
SLIDE 81

Interface Optimization

Which Question? Which Options? Which Order?

slide-82
SLIDE 82

Scaling It Up

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

Learning Data Mapper Interface Optimizer

slide-83
SLIDE 83

Scaling It Up

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

Learning Data Mapper Interface Optimizer

slide-84
SLIDE 84

Scaling It Up

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

Learning Data Mapper Interface Optimizer

slide-85
SLIDE 85
slide-86
SLIDE 86

$ $

$

$

$

$ $ $ $

$

$

$

$

$ $

$

$

$ $ $

$

$ $ $

$

Verify Cheapest Claims First?

slide-87
SLIDE 87

$ $

$

$

$

$ $ $ $

$

$

$

$

$ $

$

$

$ $ $

$

$ $ $

$

Verify Cheapest Claims First?

? ?

?

?

?

?

? ?

?

?

?

?

?

?

?

? ?

?

? ?

Verify Interesting Claims First?

slide-88
SLIDE 88

$ $

$

$

$

$ $ $ $

$

$

$

$

$ $

$

$

$ $ $

$

$ $ $

$

Verify Cheapest Claims First?

? ?

?

?

?

?

? ?

?

?

?

?

?

?

?

? ?

?

? ?

Verify Interesting Claims First? Consider Text Structure?

slide-89
SLIDE 89

Scaling It Up

Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ... Lorem 
 Ipsum ...

Learning Data Mapper Interface Optimizer Claim Ordering

slide-90
SLIDE 90

User Study @ IEA

Claims per Minute

0.3 0.6 0.9 1.2

Verification Method With Tool Without Tool

> 2x Speedup!

slide-91
SLIDE 91

(Demo)

Scrutinizer: fact checking statistical claims. 


VLDB 2020


  • G. Karagiannis, M. Saeed, P

. Papotti, I. Trummer.

slide-92
SLIDE 92

CoronaCheck Impact

slide-93
SLIDE 93

CoronaCheck Impact

12,000 Users

slide-94
SLIDE 94

Team

  • I. Trummer
  • S. Jo
  • G. Karagiannis
  • N. Mehta
  • D. Liu
  • W. Yu
  • C. Yu
  • X. Wang

P . Papotti

  • M. Saeed

Cornell Faculty PhD @ Cornell PhD @ Cornell Ugrad @ Cornell Ugrad @ Cornell Ugrad @ Cornell Scientist @ Google Scientist @ Google Eurecom Faculty PhD @ Eurecom

slide-95
SLIDE 95

Conclusion

  • Data-driven fact checking
  • Various use cases
  • Presented two tools:
  • AggChecker
  • Verifying text summaries of relational data sets.


SIGMOD 2019 (ArXiV 2018).

  • Scrutinizer
  • Scrutinizer: fact checking statistical claims. 


VLDB 2020.

slide-96
SLIDE 96

Questions?

www.itrummer.org itrummer@cornell.edu