SLIDE 1 An Interpretable Joint Graphical Model for Fact-Checking from Crowds
An T. Nguyen1 Aditya Kharosekar1 Matthew Lease1 Byron C. Wallace2
1University of Texas at Austin 2 Northeastern University
1
SLIDE 2
Problems
Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language.
2
SLIDE 3
Problems
Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language. and relevant article headlines: No, Facebook Did Not Panic and Shut Down an AI Program That Was Getting Dangerously Smart. source: gizmodo.com
2
SLIDE 4
Problems
Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language. and relevant article headlines: No, Facebook Did Not Panic and Shut Down an AI Program That Was Getting Dangerously Smart. source: gizmodo.com Predict headline stance: For Against Observing
2
SLIDE 5
Problems
Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language. and relevant article headlines: No, Facebook Did Not Panic and Shut Down an AI Program That Was Getting Dangerously Smart. source: gizmodo.com Predict headline stance: For Against Observing Predict claim veracity: False True Unknown
2
SLIDE 6
Problems
Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language. and relevant article headlines: No, Facebook Did Not Panic and Shut Down an AI Program That Was Getting Dangerously Smart. source: gizmodo.com Predict headline stance: For Against Observing Predict claim veracity: False True Unknown
2
SLIDE 7
Problems
Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language. and relevant article headlines: No, Facebook Did Not Panic and Shut Down an AI Program That Was Getting Dangerously Smart. source: gizmodo.com Predict headline stance: For Against Observing Predict claim veracity: False True Unknown Our motivation:
◮ Make sense of general claims incl. scientific, historical, ...
2
SLIDE 8
Problems
Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language. and relevant article headlines: No, Facebook Did Not Panic and Shut Down an AI Program That Was Getting Dangerously Smart. source: gizmodo.com Predict headline stance: For Against Observing Predict claim veracity: False True Unknown Our motivation:
◮ Make sense of general claims incl. scientific, historical, ... ◮ Not just “fake news”.
2
SLIDE 9
Solutions
Previous work:
3
SLIDE 10
Solutions
Previous work:
◮ Predict stance from text features (Ferreira& Vlachos 2016).
3
SLIDE 11
Solutions
Previous work:
◮ Predict stance from text features (Ferreira& Vlachos 2016). ◮ Predict veracity from stance+source features (Popat et al.
2017)
3
SLIDE 12
Solutions
Previous work:
◮ Predict stance from text features (Ferreira& Vlachos 2016). ◮ Predict veracity from stance+source features (Popat et al.
2017) We proposed:
◮ Crowdsource stance labels.
3
SLIDE 13 Solutions
Previous work:
◮ Predict stance from text features (Ferreira& Vlachos 2016). ◮ Predict veracity from stance+source features (Popat et al.
2017) We proposed:
◮ Crowdsource stance labels.
◮ Hybrid human AI ◮ Available near real-time
3
SLIDE 14 Solutions
Previous work:
◮ Predict stance from text features (Ferreira& Vlachos 2016). ◮ Predict veracity from stance+source features (Popat et al.
2017) We proposed:
◮ Crowdsource stance labels.
◮ Hybrid human AI ◮ Available near real-time
◮ Joint graphical model of stance, veracity, annotators.
3
SLIDE 15 Solutions
Previous work:
◮ Predict stance from text features (Ferreira& Vlachos 2016). ◮ Predict veracity from stance+source features (Popat et al.
2017) We proposed:
◮ Crowdsource stance labels.
◮ Hybrid human AI ◮ Available near real-time
◮ Joint graphical model of stance, veracity, annotators.
◮ Interaction between variables ◮ Interpretable
3
SLIDE 16
Model
V S T W U L A B R
n claims m sources c lablers
Powered by TCPDF (www.tcpdf.org) Powered by TCPDF (www.tcpdf.org)
4
SLIDE 17 Model
◮ Text features T
V S T W U L A B R
n claims m sources c lablers
Powered by TCPDF (www.tcpdf.org) Powered by TCPDF (www.tcpdf.org) Powered by TCPDF (www.tcpdf.org)
4
SLIDE 18 Model
◮ Text features T
◮ Stance S ◮ Reputation R
V S T W U L A B R
n claims m sources c lablers
Powered by TCPDF (www.tcpdf.org) Powered by TCPDF (www.tcpdf.org) Powered by TCPDF (www.tcpdf.org)
4
SLIDE 19 Model
◮ Text features T
◮ Stance S ◮ Reputation R
◮ True stance S ◮ Annotator competence A
V S T W U L A B R
n claims m sources c lablers
Powered by TCPDF (www.tcpdf.org) Powered by TCPDF (www.tcpdf.org)
4
SLIDE 20
Inference & Learning
Inference:
◮ Gibbs sampling: accurate but slow.
5
SLIDE 21
Inference & Learning
Inference:
◮ Gibbs sampling: accurate but slow. ◮ Variational inference: fast but biased.
5
SLIDE 22
Inference & Learning
Inference:
◮ Gibbs sampling: accurate but slow. ◮ Variational inference: fast but biased.
Learning: Expectation Maximization.
5
SLIDE 23
Inference & Learning
Inference:
◮ Gibbs sampling: accurate but slow. ◮ Variational inference: fast but biased.
Learning: Expectation Maximization. Details in the paper.
5
SLIDE 24
Evaluation
Data: Emergent (Ferreira and Vlachos 2016)
◮ 300 claims. ◮ 2595 articles with stance labels.
6
SLIDE 25
Evaluation
Data: Emergent (Ferreira and Vlachos 2016)
◮ 300 claims. ◮ 2595 articles with stance labels. ◮ We collected: crowd stance labels by Mechanical Turk.
6
SLIDE 26
Evaluation
Data: Emergent (Ferreira and Vlachos 2016)
◮ 300 claims. ◮ 2595 articles with stance labels. ◮ We collected: crowd stance labels by Mechanical Turk.
Baseline: Separated models for stance, veracity & crowd labels.
6
SLIDE 27
Evaluation
Data: Emergent (Ferreira and Vlachos 2016)
◮ 300 claims. ◮ 2595 articles with stance labels. ◮ We collected: crowd stance labels by Mechanical Turk.
Baseline: Separated models for stance, veracity & crowd labels. Metric: Brier score, measures accuracy and prob. calibration.
6
SLIDE 28
Results
7
SLIDE 29
User study
Interface: users enter claims, see predictions.
8
SLIDE 30
User study
Interface: users enter claims, see predictions. A/B testing
8
SLIDE 31
User study
Interface: users enter claims, see predictions. A/B testing
◮ A: see only veracity predictions
8
SLIDE 32
User study
Interface: users enter claims, see predictions. A/B testing
◮ A: see only veracity predictions ◮ B: also see explanation (reputation, stances)
8
SLIDE 33
User study
Interface: users enter claims, see predictions. A/B testing
◮ A: see only veracity predictions ◮ B: also see explanation (reputation, stances)
8
SLIDE 34
User study: results
9
SLIDE 35
Conclusion
Takeaway:
◮ Stance/Veracity predictions are hard.
10
SLIDE 36
Conclusion
Takeaway:
◮ Stance/Veracity predictions are hard. ◮ We contribute: crowdsourcing + joint modeling.
10
SLIDE 37
Conclusion
Takeaway:
◮ Stance/Veracity predictions are hard. ◮ We contribute: crowdsourcing + joint modeling.
Paper: experiments on Snopes.
10
SLIDE 38
Conclusion
Takeaway:
◮ Stance/Veracity predictions are hard. ◮ We contribute: crowdsourcing + joint modeling.
Paper: experiments on Snopes. Demo: fcweb.pythonanywhere.com
10
SLIDE 39
Conclusion
Takeaway:
◮ Stance/Veracity predictions are hard. ◮ We contribute: crowdsourcing + joint modeling.
Paper: experiments on Snopes. Demo: fcweb.pythonanywhere.com We share code + data
10
SLIDE 40
Conclusion
Takeaway:
◮ Stance/Veracity predictions are hard. ◮ We contribute: crowdsourcing + joint modeling.
Paper: experiments on Snopes. Demo: fcweb.pythonanywhere.com We share code + data Acknowledge: Crowd annotator, reviewers, NSF.
10
SLIDE 41
Conclusion
Takeaway:
◮ Stance/Veracity predictions are hard. ◮ We contribute: crowdsourcing + joint modeling.
Paper: experiments on Snopes. Demo: fcweb.pythonanywhere.com We share code + data Acknowledge: Crowd annotator, reviewers, NSF. Questions?
10