Ranking the annotators: An agreement study on argumentation - PowerPoint PPT Presentation

Introduction Experiment Evaluation Ranking and clustering the annotators References Ranking the annotators: An agreement study on argumentation structure Andreas Peldszus Manfred Stede Applied Computational Linguistics, University of Potsdam The 7th Linguistic Annotation Workshop Interoperability with Discourse ACL Workshop, Sofia, August 8-9, 2013 Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Introduction Experiment Evaluation Ranking and clustering the annotators References Introduction classic reliability study • 2 or 3 annotators • authors, field experts, at least motivated and experienced annotators • measure agreement, identify sources of disagreement Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Introduction Experiment Evaluation Ranking and clustering the annotators References Introduction classic reliability study crowd-sourced corpus • 2 or 3 annotators • 100- x annotators • authors, field • crowd experts, at least • bias correction motivated and [Snow et al., 2008] experienced outlier identification, annotators find systematic • measure differences agreement, identify [Bhardwaj et al., sources of 2010] disagreement spammer detection [Raykar and Yu, 2012] Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Introduction Experiment Evaluation Ranking and clustering the annotators References Introduction classic reliability study classroom annotation crowd-sourced corpus • 2 or 3 annotators • 20-30 annotators • 100- x annotators • authors, field • students with • crowd experts, at least different ability and • bias correction motivated and motivation, [Snow et al., 2008] experienced obligatory outlier identification, annotators participation find systematic • measure • do both: test differences agreement, identify reliabilty & identify [Bhardwaj et al., sources of and group 2010] disagreement characteristic spammer detection annotation [Raykar and Yu, behaviour 2012] Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Introduction Experiment Evaluation Ranking and clustering the annotators References Outline 1 Introduction 2 Experiment 3 Evaluation 4 Ranking and clustering the annotators Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Introduction Experiment Evaluation Ranking and clustering the annotators References Experiment Task: Argumentation Structure Scheme based on Freeman [1991, 2011] • node types = argumentative role proponent (presents and defends claims) opponent (critically questions) • link types = argumentative function support own claims (normally, by example) attack other’s claims (rebut, undercut) Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Introduction Experiment Evaluation Ranking and clustering the annotators References Experiment Task: Argumentation Structure Scheme based on Freeman [1991, 2011] • node types = argumentative role proponent (presents and defends claims) opponent (critically questions) • link types = argumentative function support own claims (normally, by example) attack other’s claims (rebut, undercut) This annotation is tough! • fully connected discourse structure • unitizing ADUs from EDUs is already a complex text-understanding task Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Introduction Experiment Evaluation Ranking and clustering the annotators References Experiment Data: Micro-Texts Thus, we use micro-texts: • 23 short, constructed, German texts • each text exactly 5 segments long • each segment is argumentatively relevant • covering different argumentative configurations A (translated) example [ Energy-saving light bulbs contain a considerable amount of toxic substances. ] 1 [ A customary lamp can for instance contain up to five milligrams of quicksilver. ] 2 [ For this reason, they should be taken off the market, ] 3 [ unless they are virtually unbreakable. ] 4 [ This, however, is simply not case. ] 5 Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Introduction Experiment Evaluation Ranking and clustering the annotators References Experiment Setup: Classroom Annotation Obligatory annotation in class with 26 undergraduate students: • minimal training - 5 min. introduction - 30 min. reading guidelines (6p.) - very brief question answering • 45 min. annotation Annotation in three steps: • identify central claim / thesis • decide on argumentative role for each segment • decide on argumentative function for each segment Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Introduction Experiment Evaluation Ranking and clustering the annotators References Evaluation: Preparation Rewrite graphs as a list of (relational) segment labels 1:PSNS(3) 2:PSES(1) 3:PT() 4:OARS(3) 5:PARS(4) Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Introduction Experiment Evaluation Ranking and clustering the annotators References Evaluation: Results level #cats A O A E D O D E κ α role+type+comb+target (71) 0.384 0.44 0.08 unweighted scores in κ [Fleiss, 1971], weighted scores in α [Krippendorff, 1980] • low agreement for the full task • varying difficulty on the simple levels • other complex levels: target identification has only small impact • hierarchically weighted IAA yields slightly better results Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Introduction Experiment Evaluation Ranking and clustering the annotators References Evaluation: Results level #cats A O A E D O D E κ α role 2 0.521 0.78 0.55 typegen 3 0.579 0.72 0.33 type 5 0.469 0.61 0.26 comb 2 0.458 0.73 0.50 target (9) 0.490 0.58 0.17 role+type+comb+target (71) 0.384 0.44 0.08 unweighted scores in κ [Fleiss, 1971], weighted scores in α [Krippendorff, 1980] • low agreement for the full task • varying difficulty on the simple levels • other complex levels: target identification has only small impact • hierarchically weighted IAA yields slightly better results Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Introduction Experiment Evaluation Ranking and clustering the annotators References Evaluation: Results level #cats A O A E D O D E κ α role 2 0.521 0.78 0.55 typegen 3 0.579 0.72 0.33 type 5 0.469 0.61 0.26 comb 2 0.458 0.73 0.50 target (9) 0.490 0.58 0.17 role+typegen 5 0.541 0.66 0.25 role+type 9 0.450 0.56 0.20 role+type+comb 15 0.392 0.49 0.16 role+type+comb+target (71) 0.384 0.44 0.08 unweighted scores in κ [Fleiss, 1971], weighted scores in α [Krippendorff, 1980] • low agreement for the full task • varying difficulty on the simple levels • other complex levels: target identification has only small impact • hierarchically weighted IAA yields slightly better results Ranking the annotators: An agreement study on argumentation structure Peldszus, Stede

Ranking the annotators: An agreement study on argumentation - PowerPoint PPT Presentation

Introduction Experiment Evaluation Ranking and clustering the annotators References Ranking the annotators: An agreement study on argumentation structure Andreas Peldszus Manfred Stede Applied Computational Linguistics, University of

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Measuring social biases in human annotators using counterfactual queries in Crowdsourcing BHAVYA

Evaluating Dialogue Act Tagging with Naive & Expert annotators Jeroen Geertzen & Volha

Demographic Surveys of Arab Annotators on CrowdFlower Hamdy Mubarak, Kareem Darwish {hmubarak,

SCOPE OF THE TBT AGREEMENT TRADE IN GOODS GATT 1994 TBT Agreement lex specialis SCOPE OF THE

+ Ranking Factor Latest Trends What factors matter in 2016-2017 for ranking your Google

A Ranking Method to Improve A Ranking Method to Improve Detection of Disease Using Selectively

Kernel Principal Component Ranking: Robust Ranking on Noisy Data Evgeni Tsivtsivadze Botond

KNN and re ranking models for English KNN and re-ranking models for English patent mining at

Tutorial Ranking Mechanisms in Games Vanessa Volz and Boris Naujoks CIG 2018, Maastricht

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

PRanking with Ranking Koby Crammer Technion Israel Institute of Technology Based on joint

Sprache und Gefhl: Neuropsychologische Einsichten in die emotionale Prosodie Axel Mecklinger

Rice Midstream Partners Second Quarter 2015 Supplemental Slides August 6, 2015 1 RMP: High

Large Subalgebras and the Structure of Crossed Products, Lecture 1: Introduction, Motivation, and

Week 7 Video 1 Clustering Clustering A type of Structure Discovery algorithm This type of

Non-commutative Cantor-Bendixson derivatives and scattered C*-algebras Saeed Ghasemi (Joint work

rs ts

ESG IMPLEMENTATION BY PENSION FUNDS Mabatho Seeiso Jolly Mokorosi Jon Duncan Moderated by

Economics 2 Professor Christina Romer Spring 2017 Professor David Romer LECTURE 13 LABOR AND

Sambuz

Useful Links

Newsletter

Mail Us

Ranking the annotators: An agreement study on argumentation - PowerPoint PPT Presentation

Introduction Experiment Evaluation Ranking and clustering the annotators References Ranking the annotators: An agreement study on argumentation structure Andreas Peldszus Manfred Stede Applied Computational Linguistics, University of

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Measuring social biases in human annotators using counterfactual queries in Crowdsourcing BHAVYA

Evaluating Dialogue Act Tagging with Naive &amp; Expert annotators Jeroen Geertzen &amp; Volha

Demographic Surveys of Arab Annotators on CrowdFlower Hamdy Mubarak, Kareem Darwish {hmubarak,

SCOPE OF THE TBT AGREEMENT TRADE IN GOODS GATT 1994 TBT Agreement lex specialis SCOPE OF THE

+ Ranking Factor Latest Trends What factors matter in 2016-2017 for ranking your Google

A Ranking Method to Improve A Ranking Method to Improve Detection of Disease Using Selectively

Kernel Principal Component Ranking: Robust Ranking on Noisy Data Evgeni Tsivtsivadze Botond

KNN and re ranking models for English KNN and re-ranking models for English patent mining at

Tutorial Ranking Mechanisms in Games Vanessa Volz and Boris Naujoks CIG 2018, Maastricht

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

PRanking with Ranking Koby Crammer Technion Israel Institute of Technology Based on joint

Sprache und Gefhl: Neuropsychologische Einsichten in die emotionale Prosodie Axel Mecklinger

Rice Midstream Partners Second Quarter 2015 Supplemental Slides August 6, 2015 1 RMP: High

Large Subalgebras and the Structure of Crossed Products, Lecture 1: Introduction, Motivation, and

Week 7 Video 1 Clustering Clustering A type of Structure Discovery algorithm This type of

Non-commutative Cantor-Bendixson derivatives and scattered C*-algebras Saeed Ghasemi (Joint work

rs ts

ESG IMPLEMENTATION BY PENSION FUNDS Mabatho Seeiso Jolly Mokorosi Jon Duncan Moderated by

Economics 2 Professor Christina Romer Spring 2017 Professor David Romer LECTURE 13 LABOR AND

Sambuz

Useful Links

Newsletter

Mail Us

Evaluating Dialogue Act Tagging with Naive & Expert annotators Jeroen Geertzen & Volha