wikiconv
play

WikiConv A Corpus of the Complete Conversational History of a Large - PowerPoint PPT Presentation

WikiConv A Corpus of the Complete Conversational History of a Large Online Collaborative Community Yiqing Hua Cristian Danescu-Niculescu-Mizil Dario Taraborelli Nithum Thain Jeffery Sorensen Lucas Dixon 1 Conversations on Wikipedia


  1. WikiConv A Corpus of the Complete Conversational History of a Large Online Collaborative Community Yiqing Hua Cristian Danescu-Niculescu-Mizil Dario Taraborelli Nithum Thain Jeffery Sorensen Lucas Dixon 1

  2. Conversations on Wikipedia http://www.boerenopeenkruispunt.be/Tips/tabid/105/A rticleID/245/Default.aspx Can I add this image to the page? 2

  3. Conversations on Wikipedia http://www.boerenopeenkruispunt.be/Tips/tabid/105/A rticleID/245/Default.aspx Can I add this image to the page? Talk pages are technically the same as article pages on Wikipedia. 3

  4. Research interest on Wikipedia talk pages http://thebluepaper.com/article/guest-commentary-grinder-p http://www.boerenopeenkruispunt.be/Tips/tabid/105/A ump-attacks-get-personal/ https://it.depositphotos.com/12630133/stock-photo-3d-talkin rticleID/245/Default.aspx g-concept-over-white.html Disputes Conversational Behaviors Antisocial Behavior [Wulczyn et al. 2017] [Wang and Cardie 2014a] [Danescu-Niculescu-Mizil 2012] [Zhang et al. 2018] [Wang and Cardie 2014b] [Bender et al. 2011] [Kittur et al. 2007] [Kittur et al. 2008] [Halfaker et al. 2009] 4

  5. Conversation Snapshot Vs. History Prior Work Revision 1 Reconstructing data from snapshots. === Image === I made this image last May and I want to add it to the page. Revision 2 This image fits the other page more. All right. 5

  6. Conversation Snapshot Vs. History Prior Work Revision 1 Reconstructing data from snapshots. - Evolution of conversations is missed. - Does not scale. Revision 2 6

  7. WikiConv: History of User Interactions === Image === Addition Revision 1 I made this image ... Addition WikiConv This image fits the other ... Addition All right. Addition Revision 2 This image is stupid ... Addition This image is stupid ... Deletion + Captures evolution of conversations. + Scalable to entire Wikipedia. + Works for multiple languages. 7

  8. Reconstruction Challenges Parsing ambiguity. Revision 1 What’s the boundary of each comment? Revision 2 8

  9. Reconstruction Challenges Parsing ambiguity. Revision 1 Action ambiguity. Is this action a deletion or a modification of another comment? Revision 2 9

  10. Reconstruction Challenges Parsing ambiguity. Revision 1 Action ambiguity. Scale of Wikipedia. Wikipedia data dump with all edit history is > 10TB (English alone). Revision 2 10

  11. Reconstruction Challenges Parsing ambiguity, carefully Revision 1 designed heuristics (detail in paper and codebase) Action ambiguity, better defined conversational actions that captures the interaction nature, Revision 2 Scale of Wikipedia, distributed computing pipeline on Google Dataflow. 11

  12. Conversational Actions 12

  13. Conversational Actions - Creation: - the start of a conversation thread based on a markup section heading being added. - Addition: - the addition of a new comment to a thread. - Modification: == Improving the Article == - modification of an existing comment. - Deletion: - the removal of a comment or heading. - Restoration: - a revert specifies the deleted action being undone. 13

  14. Conversational Actions - Creation: - the start of a conversation thread based on a markup section heading being added. - Addition: - the addition of a new comment to a thread. - Modification: == Improving the Article == - modification of an existing comment. - Deletion: Let’s discuss how to write this - the removal of a comment or heading. article! - Restoration: - a revert specifies the deleted action being undone. 14

  15. Conversational Actions - Creation: - the start of a conversation thread based on a markup section heading being added. - Addition: - the addition of a new comment to a thread. - Modification: == Improving the Article == - modification of an existing comment. - Deletion: Let’s discuss how to improve - the removal of a comment or heading. this article! - Restoration: - a revert specifies the deleted action being undone. 15

  16. Conversational Actions - Creation: - the start of a conversation thread based on a markup section heading being added. - Addition: - the addition of a new comment to a thread. - Modification: == Improving the Article == - modification of an existing comment. - Deletion: - the removal of a comment or heading. - Restoration: - a revert specifies the deleted action being undone. 16

  17. Conversational Actions - Creation: - the start of a conversation thread based on a markup section heading being added. - Addition: - the addition of a new comment to a thread. - Modification: == Improving the Article == - modification of an existing comment. - Deletion: Let’s discuss how to improve - the removal of a comment or heading. this article! - Restoration: - a revert specifies the deleted action being undone. 17

  18. Conversational Actions - Creation: - the start of a conversation thread based on a markup section heading being added. - Addition: - the addition of a new comment to a thread. - Modification: == Improving the Article == - modification of an existing comment. - Deletion: Let’s discuss how to improve - the removal of a comment or heading. this article! - Restoration: - a revert specifies the deleted action being undone. 18

  19. Resulted Dataset Statistics === Image === Addition I made this image ... Addition This image fits the other ... Addition All right. Addition This image is stupid ... Addition This image is stupid ... Deletion 4.3M Users 24M Talk Pages 91M Conversations 120M Revisions 241M Actions 19

  20. Reconstruction Pipeline Revision X - 1 Revision X Revision X + 1 Compute Diff (diff-match-patch) Page State Page State (after Revision X-1) after Revision X Records offsets of comments of those Decompose into that are present on the page actions 20

  21. Evaluation Result -- WikiConv Manually evaluated accuracy on 100 randomly sampled actions from each category. 21

  22. Evaluation Result -- WikiConv 22

  23. Research on WikiConv Moderation of Toxic Behavior Toxic Behavior: Comments in the discussion that might disencourage others to participate in the conversation. Tool: Perspective API , a CNN-based API service that scores toxicity of a comment and was trained on Wikipedia data. 23

  24. Moderation of Toxic Behavior Addition and creation contents are labeled by Perspective API in terms of severe toxic, toxic and non-toxic. We measure the speed of deletion of these contents. Percentage of content being deleted 24

  25. Moderation of Toxic Behavior Addition and creation contents are labeled by Perspective API in terms of severe toxic, toxic and non-toxic. We measure the speed of deletion of these contents. Percentage of content being deleted 89% of the severe toxic contents are removed from Wikipedia. 25

  26. Moderation of Toxic Behavior 82% of the severe toxic contents and 33% of the toxic contents are deleted within a day . Percentage of content being deleted User interactions being captured in our dataset: Toxic behaviors are deleted quickly . 26

  27. WikiConv Wikipedia Talk Page reconstruction pipeline Large Scale Multiple Languages Complete 4.3M Users English Captures the 24M Talk Pages Chinese evolution of the 120M Revisions German conversations. 91M Conversations Russian 241M Conversational Actions Greek (statistics of English dataset) WikiConv Codebase: https://github.com/conversationai/wikidetox/tree/master/wikiconv Dataset: https://console.cloud.google.com/storage/browser/wikidetox-wikiconv-public-dataset 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend