WikiConv
A Corpus of the Complete Conversational History of a Large Online Collaborative Community
Yiqing Hua Cristian Danescu-Niculescu-Mizil Dario Taraborelli Nithum Thain Jeffery Sorensen Lucas Dixon
1
WikiConv A Corpus of the Complete Conversational History of a Large - - PowerPoint PPT Presentation
WikiConv A Corpus of the Complete Conversational History of a Large Online Collaborative Community Yiqing Hua Cristian Danescu-Niculescu-Mizil Dario Taraborelli Nithum Thain Jeffery Sorensen Lucas Dixon 1 Conversations on Wikipedia
1
2
http://www.boerenopeenkruispunt.be/Tips/tabid/105/A rticleID/245/Default.aspx
Can I add this image to the page?
3
http://www.boerenopeenkruispunt.be/Tips/tabid/105/A rticleID/245/Default.aspx
Can I add this image to the page?
Antisocial Behavior Disputes Conversational Behaviors
[Wulczyn et al. 2017] [Zhang et al. 2018] [Wang and Cardie 2014a] [Wang and Cardie 2014b] [Kittur et al. 2007] [Danescu-Niculescu-Mizil 2012] [Bender et al. 2011] [Kittur et al. 2008] [Halfaker et al. 2009]
http://thebluepaper.com/article/guest-commentary-grinder-p ump-attacks-get-personal/ http://www.boerenopeenkruispunt.be/Tips/tabid/105/A rticleID/245/Default.aspx https://it.depositphotos.com/12630133/stock-photo-3d-talkin g-concept-over-white.html
4
5
Revision 1 Revision 2 Prior Work Reconstructing data from snapshots. === Image === I made this image last May and I want to add it to the page. This image fits the other page more. All right.
6
Revision 1 Revision 2
missed.
Prior Work Reconstructing data from snapshots.
7
Revision 1 Revision 2
Addition === Image === Addition I made this image ... Addition This image fits the other ... Addition All right. Addition This image is stupid ... Deletion This image is stupid ... + Captures evolution of conversations. + Scalable to entire Wikipedia. + Works for multiple languages.
8
Revision 1 Revision 2
What’s the boundary of each comment?
9
Revision 1 Revision 2 Parsing ambiguity.
Is this action a deletion or a modification
10
Revision 1 Revision 2 Parsing ambiguity. Action ambiguity.
Wikipedia data dump with all edit history is > 10TB (English alone).
11
Revision 1 Revision 2
designed heuristics (detail in paper and codebase)
conversational actions that captures the interaction nature,
computing pipeline on Google Dataflow.
12
== Improving the Article ==
13
== Improving the Article == Let’s discuss how to write this article!
14
== Improving the Article == Let’s discuss how to improve this article!
15
== Improving the Article ==
16
== Improving the Article == Let’s discuss how to improve this article!
17
== Improving the Article == Let’s discuss how to improve this article!
18
4.3M Users 24M Talk Pages 120M Revisions 91M Conversations 241M Actions
19
Addition === Image === Addition I made this image ... Addition This image fits the other ... Addition All right. Addition This image is stupid ... Deletion This image is stupid ...
Page State (after Revision X-1)
Records offsets of comments of those that are present on the page
Compute Diff (diff-match-patch)
Decompose into actions
Revision X Revision X + 1 Revision X - 1
20
Page State after Revision X
21
22
23
24
Percentage of content being deleted
25
Percentage of content being deleted
26
Percentage of content being deleted
27
4.3M Users 24M Talk Pages 120M Revisions 91M Conversations 241M Conversational Actions (statistics of English dataset)
English Chinese German Russian Greek
Captures the evolution of the conversations.
https://github.com/conversationai/wikidetox/tree/master/wikiconv
https://console.cloud.google.com/storage/browser/wikidetox-wikiconv-public-dataset