ranking the annotators an agreement study on
play

Ranking the annotators: An agreement study on argumentation - PowerPoint PPT Presentation

Introduction Experiment Evaluation Ranking and clustering the annotators References Ranking the annotators: An agreement study on argumentation structure Andreas Peldszus Manfred Stede Applied Computational Linguistics, University of


  1. Introduction Experiment Evaluation Ranking and clustering the annotators References Ranking the annotators: An agreement study on argumentation structure Andreas Peldszus Manfred Stede Applied Computational Linguistics, University of Potsdam The 7th Linguistic Annotation Workshop Interoperability with Discourse ACL Workshop, Sofia, August 8-9, 2013 Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  2. Introduction Experiment Evaluation Ranking and clustering the annotators References Introduction classic reliability study • 2 or 3 annotators • authors, field experts, at least motivated and experienced annotators • measure agreement, identify sources of disagreement Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  3. Introduction Experiment Evaluation Ranking and clustering the annotators References Introduction classic reliability study crowd-sourced corpus • 2 or 3 annotators • 100- x annotators • authors, field • crowd experts, at least • bias correction motivated and [Snow et al., 2008] experienced outlier identification, annotators find systematic • measure differences agreement, identify [Bhardwaj et al., sources of 2010] disagreement spammer detection [Raykar and Yu, 2012] Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  4. Introduction Experiment Evaluation Ranking and clustering the annotators References Introduction classic reliability study classroom annotation crowd-sourced corpus • 2 or 3 annotators • 20-30 annotators • 100- x annotators • authors, field • students with • crowd experts, at least different ability and • bias correction motivated and motivation, [Snow et al., 2008] experienced obligatory outlier identification, annotators participation find systematic • measure • do both: test differences agreement, identify reliabilty & identify [Bhardwaj et al., sources of and group 2010] disagreement characteristic spammer detection annotation [Raykar and Yu, behaviour 2012] Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  5. Introduction Experiment Evaluation Ranking and clustering the annotators References Outline 1 Introduction 2 Experiment 3 Evaluation 4 Ranking and clustering the annotators Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  6. Introduction Experiment Evaluation Ranking and clustering the annotators References Experiment Task: Argumentation Structure Scheme based on Freeman [1991, 2011] • node types = argumentative role proponent (presents and defends claims) opponent (critically questions) • link types = argumentative function support own claims (normally, by example) attack other’s claims (rebut, undercut) Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  7. Introduction Experiment Evaluation Ranking and clustering the annotators References Experiment Task: Argumentation Structure Scheme based on Freeman [1991, 2011] • node types = argumentative role proponent (presents and defends claims) opponent (critically questions) • link types = argumentative function support own claims (normally, by example) attack other’s claims (rebut, undercut) Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  8. Introduction Experiment Evaluation Ranking and clustering the annotators References Experiment Task: Argumentation Structure Scheme based on Freeman [1991, 2011] • node types = argumentative role proponent (presents and defends claims) opponent (critically questions) • link types = argumentative function support own claims (normally, by example) attack other’s claims (rebut, undercut) Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  9. Introduction Experiment Evaluation Ranking and clustering the annotators References Experiment Task: Argumentation Structure Scheme based on Freeman [1991, 2011] • node types = argumentative role proponent (presents and defends claims) opponent (critically questions) • link types = argumentative function support own claims (normally, by example) attack other’s claims (rebut, undercut) This annotation is tough! • fully connected discourse structure • unitizing ADUs from EDUs is already a complex text-understanding task Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  10. Introduction Experiment Evaluation Ranking and clustering the annotators References Experiment Data: Micro-Texts Thus, we use micro-texts: • 23 short, constructed, German texts • each text exactly 5 segments long • each segment is argumentatively relevant • covering different argumentative configurations A (translated) example [ Energy-saving light bulbs contain a considerable amount of toxic substances. ] 1 [ A customary lamp can for instance contain up to five milligrams of quicksilver. ] 2 [ For this reason, they should be taken off the market, ] 3 [ unless they are virtually unbreakable. ] 4 [ This, however, is simply not case. ] 5 Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  11. Introduction Experiment Evaluation Ranking and clustering the annotators References Experiment Data: Micro-Texts Thus, we use micro-texts: • 23 short, constructed, German texts • each text exactly 5 segments long • each segment is argumentatively relevant • covering different argumentative configurations A (translated) example [ Energy-saving light bulbs contain a considerable amount of toxic substances. ] 1 [ A customary lamp can for instance contain up to five milligrams of quicksilver. ] 2 [ For this reason, they should be taken off the market, ] 3 [ unless they are virtually unbreakable. ] 4 [ This, however, is simply not case. ] 5 Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  12. Introduction Experiment Evaluation Ranking and clustering the annotators References Experiment Setup: Classroom Annotation Obligatory annotation in class with 26 undergraduate students: • minimal training - 5 min. introduction - 30 min. reading guidelines (6p.) - very brief question answering • 45 min. annotation Annotation in three steps: • identify central claim / thesis • decide on argumentative role for each segment • decide on argumentative function for each segment Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  13. Introduction Experiment Evaluation Ranking and clustering the annotators References Experiment Setup: Classroom Annotation Obligatory annotation in class with 26 undergraduate students: • minimal training - 5 min. introduction - 30 min. reading guidelines (6p.) - very brief question answering • 45 min. annotation Annotation in three steps: • identify central claim / thesis • decide on argumentative role for each segment • decide on argumentative function for each segment Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  14. Introduction Experiment Evaluation Ranking and clustering the annotators References Evaluation: Preparation Rewrite graphs as a list of (relational) segment labels 1:PSNS(3) 2:PSES(1) 3:PT() 4:OARS(3) 5:PARS(4) Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  15. Introduction Experiment Evaluation Ranking and clustering the annotators References Evaluation: Results level #cats A O A E D O D E κ α role+type+comb+target (71) 0.384 0.44 0.08 unweighted scores in κ [Fleiss, 1971], weighted scores in α [Krippendorff, 1980] • low agreement for the full task • varying difficulty on the simple levels • other complex levels: target identification has only small impact • hierarchically weighted IAA yields slightly better results Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  16. Introduction Experiment Evaluation Ranking and clustering the annotators References Evaluation: Results level #cats A O A E D O D E κ α role 2 0.521 0.78 0.55 typegen 3 0.579 0.72 0.33 type 5 0.469 0.61 0.26 comb 2 0.458 0.73 0.50 target (9) 0.490 0.58 0.17 role+type+comb+target (71) 0.384 0.44 0.08 unweighted scores in κ [Fleiss, 1971], weighted scores in α [Krippendorff, 1980] • low agreement for the full task • varying difficulty on the simple levels • other complex levels: target identification has only small impact • hierarchically weighted IAA yields slightly better results Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

  17. Introduction Experiment Evaluation Ranking and clustering the annotators References Evaluation: Results level #cats A O A E D O D E κ α role 2 0.521 0.78 0.55 typegen 3 0.579 0.72 0.33 type 5 0.469 0.61 0.26 comb 2 0.458 0.73 0.50 target (9) 0.490 0.58 0.17 role+typegen 5 0.541 0.66 0.25 role+type 9 0.450 0.56 0.20 role+type+comb 15 0.392 0.49 0.16 role+type+comb+target (71) 0.384 0.44 0.08 unweighted scores in κ [Fleiss, 1971], weighted scores in α [Krippendorff, 1980] • low agreement for the full task • varying difficulty on the simple levels • other complex levels: target identification has only small impact • hierarchically weighted IAA yields slightly better results Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend