Citation Detective : A Public Dataset to Improve and Quantify - - PowerPoint PPT Presentation

citation detective
SMART_READER_LITE
LIVE PREVIEW

Citation Detective : A Public Dataset to Improve and Quantify - - PowerPoint PPT Presentation

Citation Detective : A Public Dataset to Improve and Quantify Wikipedia Citation Quality at Scale Ai-Jou Chou, National Chiao Tung University Guilherme Gonalves, Google Sam Walton, Wikimedia Foundation Miriam Redi, Wikimedia Foundation


slide-1
SLIDE 1

Ai-Jou Chou, National Chiao Tung University Guilherme Gonçalves, Google Sam Walton, Wikimedia Foundation Miriam Redi, Wikimedia Foundation

Citation Detective:

A Public Dataset to Improve and Quantify Wikipedia Citation Quality at Scale

https://github.com/AikoChou/citationdetective

slide-2
SLIDE 2

An end-to-end system periodically release public, usable dataset exposing sentences classified as missing citations.

Citation Detective

Use Case: Integration into Citation Hunt

MediaWiki API Text Processing Citation Need Prediction Data dump

[1]

[1] Miriam Redi, Besnik Fetahu, Jonathan Morgan, Dario Taraborelli. 2019. Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's Verifiability. [2] Citation Hunt (https://meta.wikimedia.org/wiki/Citation_Hunt)

slide-3
SLIDE 3

Citation Quality: the proportion of “well sourced” sentences in an article.

Quantify Citation Quality

Breakdown of Citation Quality by Topic

Biography Sports Biology Medicine & Health