WikiConv A Corpus of the Complete Conversational History of a Large - - PowerPoint PPT Presentation

wikiconv
SMART_READER_LITE
LIVE PREVIEW

WikiConv A Corpus of the Complete Conversational History of a Large - - PowerPoint PPT Presentation

WikiConv A Corpus of the Complete Conversational History of a Large Online Collaborative Community Yiqing Hua Cristian Danescu-Niculescu-Mizil Dario Taraborelli Nithum Thain Jeffery Sorensen Lucas Dixon 1 Conversations on Wikipedia


slide-1
SLIDE 1

WikiConv

A Corpus of the Complete Conversational History of a Large Online Collaborative Community

Yiqing Hua Cristian Danescu-Niculescu-Mizil Dario Taraborelli Nithum Thain Jeffery Sorensen Lucas Dixon

1

slide-2
SLIDE 2

Conversations on Wikipedia

2

http://www.boerenopeenkruispunt.be/Tips/tabid/105/A rticleID/245/Default.aspx

Can I add this image to the page?

slide-3
SLIDE 3

Conversations on Wikipedia

3

http://www.boerenopeenkruispunt.be/Tips/tabid/105/A rticleID/245/Default.aspx

Can I add this image to the page?

Talk pages are technically the same as article pages on Wikipedia.

slide-4
SLIDE 4

Research interest on Wikipedia talk pages

Antisocial Behavior Disputes Conversational Behaviors

[Wulczyn et al. 2017] [Zhang et al. 2018] [Wang and Cardie 2014a] [Wang and Cardie 2014b] [Kittur et al. 2007] [Danescu-Niculescu-Mizil 2012] [Bender et al. 2011] [Kittur et al. 2008] [Halfaker et al. 2009]

http://thebluepaper.com/article/guest-commentary-grinder-p ump-attacks-get-personal/ http://www.boerenopeenkruispunt.be/Tips/tabid/105/A rticleID/245/Default.aspx https://it.depositphotos.com/12630133/stock-photo-3d-talkin g-concept-over-white.html

4

slide-5
SLIDE 5

Conversation Snapshot Vs. History

5

Revision 1 Revision 2 Prior Work Reconstructing data from snapshots. === Image === I made this image last May and I want to add it to the page. This image fits the other page more. All right.

slide-6
SLIDE 6

Conversation Snapshot Vs. History

6

Revision 1 Revision 2

  • Evolution of conversations is

missed.

  • Does not scale.

Prior Work Reconstructing data from snapshots.

slide-7
SLIDE 7

WikiConv: History of User Interactions

7

Revision 1 Revision 2

Addition === Image === Addition I made this image ... Addition This image fits the other ... Addition All right. Addition This image is stupid ... Deletion This image is stupid ... + Captures evolution of conversations. + Scalable to entire Wikipedia. + Works for multiple languages.

WikiConv

slide-8
SLIDE 8

Reconstruction Challenges

8

Revision 1 Revision 2

Parsing ambiguity.

What’s the boundary of each comment?

slide-9
SLIDE 9

Reconstruction Challenges

9

Revision 1 Revision 2 Parsing ambiguity.

Action ambiguity.

Is this action a deletion or a modification

  • f another comment?
slide-10
SLIDE 10

Reconstruction Challenges

10

Revision 1 Revision 2 Parsing ambiguity. Action ambiguity.

Scale of Wikipedia.

Wikipedia data dump with all edit history is > 10TB (English alone).

slide-11
SLIDE 11

Reconstruction Challenges

11

Revision 1 Revision 2

Parsing ambiguity, carefully

designed heuristics (detail in paper and codebase)

Action ambiguity, better defined

conversational actions that captures the interaction nature,

Scale of Wikipedia, distributed

computing pipeline on Google Dataflow.

slide-12
SLIDE 12

Conversational Actions

12

slide-13
SLIDE 13

Conversational Actions

== Improving the Article ==

13

  • Creation:
  • the start of a conversation thread based on a markup section heading being added.
  • Addition:
  • the addition of a new comment to a thread.
  • Modification:
  • modification of an existing comment.
  • Deletion:
  • the removal of a comment or heading.
  • Restoration:
  • a revert specifies the deleted action being undone.
slide-14
SLIDE 14

Conversational Actions

== Improving the Article == Let’s discuss how to write this article!

14

  • Creation:
  • the start of a conversation thread based on a markup section heading being added.
  • Addition:
  • the addition of a new comment to a thread.
  • Modification:
  • modification of an existing comment.
  • Deletion:
  • the removal of a comment or heading.
  • Restoration:
  • a revert specifies the deleted action being undone.
slide-15
SLIDE 15

Conversational Actions

== Improving the Article == Let’s discuss how to improve this article!

15

  • Creation:
  • the start of a conversation thread based on a markup section heading being added.
  • Addition:
  • the addition of a new comment to a thread.
  • Modification:
  • modification of an existing comment.
  • Deletion:
  • the removal of a comment or heading.
  • Restoration:
  • a revert specifies the deleted action being undone.
slide-16
SLIDE 16

Conversational Actions

== Improving the Article ==

16

  • Creation:
  • the start of a conversation thread based on a markup section heading being added.
  • Addition:
  • the addition of a new comment to a thread.
  • Modification:
  • modification of an existing comment.
  • Deletion:
  • the removal of a comment or heading.
  • Restoration:
  • a revert specifies the deleted action being undone.
slide-17
SLIDE 17

Conversational Actions

  • Creation:
  • the start of a conversation thread based on a markup section heading being added.
  • Addition:
  • the addition of a new comment to a thread.
  • Modification:
  • modification of an existing comment.
  • Deletion:
  • the removal of a comment or heading.
  • Restoration:
  • a revert specifies the deleted action being undone.

== Improving the Article == Let’s discuss how to improve this article!

17

slide-18
SLIDE 18

Conversational Actions

  • Creation:
  • the start of a conversation thread based on a markup section heading being added.
  • Addition:
  • the addition of a new comment to a thread.
  • Modification:
  • modification of an existing comment.
  • Deletion:
  • the removal of a comment or heading.
  • Restoration:
  • a revert specifies the deleted action being undone.

== Improving the Article == Let’s discuss how to improve this article!

18

slide-19
SLIDE 19

Resulted Dataset Statistics

4.3M Users 24M Talk Pages 120M Revisions 91M Conversations 241M Actions

19

Addition === Image === Addition I made this image ... Addition This image fits the other ... Addition All right. Addition This image is stupid ... Deletion This image is stupid ...

slide-20
SLIDE 20

Page State (after Revision X-1)

Records offsets of comments of those that are present on the page

Reconstruction Pipeline

Compute Diff (diff-match-patch)

Decompose into actions

Revision X Revision X + 1 Revision X - 1

20

Page State after Revision X

slide-21
SLIDE 21

Evaluation Result -- WikiConv

21

Manually evaluated accuracy on 100 randomly sampled actions from each category.

slide-22
SLIDE 22

Evaluation Result -- WikiConv

22

slide-23
SLIDE 23

Research on WikiConv

Moderation of Toxic Behavior Toxic Behavior:

Comments in the discussion that might disencourage others to participate in the conversation.

Tool:

Perspective API, a CNN-based API service that scores toxicity of a comment and was trained on Wikipedia data.

23

slide-24
SLIDE 24

Moderation of Toxic Behavior

Addition and creation contents are labeled by Perspective API in terms

  • f severe toxic, toxic and

non-toxic. We measure the speed of deletion of these contents.

24

Percentage of content being deleted

slide-25
SLIDE 25

Moderation of Toxic Behavior

25

Percentage of content being deleted

89% of the severe toxic contents are removed from Wikipedia. Addition and creation contents are labeled by Perspective API in terms

  • f severe toxic, toxic and

non-toxic. We measure the speed of deletion of these contents.

slide-26
SLIDE 26

Moderation of Toxic Behavior

82% of the severe toxic

contents and

33% of the toxic

contents are deleted within a day.

26

Percentage of content being deleted

User interactions being captured in our dataset: Toxic behaviors are deleted quickly.

slide-27
SLIDE 27

WikiConv

27

Large Scale

4.3M Users 24M Talk Pages 120M Revisions 91M Conversations 241M Conversational Actions (statistics of English dataset)

Multiple Languages

English Chinese German Russian Greek

Wikipedia Talk Page reconstruction pipeline Complete

Captures the evolution of the conversations.

WikiConv Codebase:

https://github.com/conversationai/wikidetox/tree/master/wikiconv

Dataset:

https://console.cloud.google.com/storage/browser/wikidetox-wikiconv-public-dataset