Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan - PowerPoint PPT Presentation

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan Schneider Behrang Mohit Chris Dyer Kemal Oflazer Noah A. Smith 1

Gameplan Supersense(Tagging Baselines MT0in0the0Middle Analysis Outlook 3

Supersense(Tagging • A coarse form of word sense disambiguation (partitioning of WordNet synsets) • Generalizes NER beyond proper names; 26 noun categories (Ciaramita & Johnson 2003) SOCIAL Pierre Vinken , 61 years old , will join the board as a nonexecutive director N PERSON TIME GROUP PERSON • Categories broadly applicable across domains • Scheme suitable for direct annotation (Schneider et al. 2012) 4

Supersense(Tagging • English resources WordNet (Fellbaum 1998) ‣ Tagger trained on English SemCor ‣ (Ciaramita & Altun 2006) 77% F 1 in-domain • Arabic resources Arabic WordNet (El Kateb et al. 2006) ‣ Named entities in OntoNotes (Hovy et al. 2006) ‣ Supersense-tagged Wikipedia corpus ‣ (Schneider et al. 2012) 65k words—1/6 the size of SemCor 5

Baselines • Heuristic matching of • Unsupervised sequence Arabic WordNet entries model + OntoNotes NEs ‣ feature-rich (Berg- ‣ only covers 33% of Kirkpatrick et al. 2010) nouns in our corpus P R F 1 P R F 1 Ann-A 32 16 21.6 Ann-A 20 16 17.5 Ann-B 29 15 19.4 Ann-B 14 10 11.6 [evaluating on Arabic Wikipedia test set— 18 articles, 40k words] 6

MT0in0the0Middle (cf. Zitouni & Florian 2008; Rahman & Ng 2012) ( تﺎﻧوﺮﺘﻜﻟﻹا ) ﺔﺒﻟﺎﺴﻟا تﺎﻨﺤﺸﻟا ﻦﻣ ﺔﺑﺎﺤﺳ ﻦﻣ ةرﺬﻟا نﻮﻜﺘﺗ . ﻂﺳﻮﻟا ﻲﻓ اﺪﺟ ةﺮﻴﻐﺻ ﺔﻨﺤﺸﻟا ﺔﺒﺟﻮﻣ ةاﻮﻧ لﻮﺣ مﻮﲢ c d e c GWord NIST 2012 7

MT0in0the0Middle The(corn(is(composed(of(negative(shipments(((electronics()( PLANT ARTIFACT COGNITION cloud(hovering(over(the(nucleus(of(a(very(small(positive( BODY shipment(in(the(center(. ARTIFACT LOCATION 8

MT0in0the0Middle COGNITION ARTIFACT PLANT The(corn(is(composed(of(negative(shipments(((electronics()( cloud(hovering(over(the(nucleus(of(a(very(small(positive( BODY shipment(in(the(center(. ARTIFACT LOCATION 8

MT0in0the0Middle • Heuristic lexicon • MT-in-the-Middle: • matching: P R F 1 P R F 1 Ann-A 37 31 33.8 Ann-A 32 16 21.6 Ann-B 38 32 34.6 Ann-B 29 15 19.4 9

MT0in0the0Middle • MT-in-the-Middle: • Hybrid: P R F 1 P R F 1 Ann-A 37 31 33.8 Ann-A 35 36 35.5 Ann-B 38 32 34.6 Ann-B 36 36 36.0 9

Analysis • Pipeline has many places for noise: MT, English supersense tagging, and projection • We focus on the impact of translation 10

Analysis • Compare cdec vs. an o ff -the-shelf Arabic- English system from QCRI • Translation quality: BLEU METEOR TER QCRI 32.86 32.10 0.46 cdec 28.84 31.38 0.49 • ...but for MTiTM supersense tagging, cdec is consistently better (by 2–4 points). Why? 11

Analysis • Observation: overall MT scores do not necessarily measure preservation of coarse lexical semantics ‣ We really care about (rough) semantic adequacy for noun phrases ‣ We elicited lexical translation acceptability judgments for a sample of sentences (cf. Carpuat 2013: SSSST) 12

Analysis • Lexical acceptability rates: 91.9% for QCRI , 90.0% for cdec • Example errors corn , maize for atom ‣ shipments for charges ‣ electronics for electrons ‣ transliteration: IMAX for EMACS , ‣ genoa lynx for GNU Linux 13

Analysis • So lexical translation is mostly OK, and QCRI does slightly better at it • cdec ’s strength: providing better input to projection ‣ It produces word alignments, whereas QCRI gives phrase alignments 14

Outlook • Supersense tagging can be accomplished (noisily) for a language so long as it can be automatically translated to English • Further gains should come from: better MT—lexical translations and word ‣ alignments better English supersense tagging ‣ better lexicon & corpus resources ‣ 15

Thanks • Francisco Guzman & Preslav Nakov @ QCRI • Wajdi Zaghouani • Waleed Ammar • QNRF • All of you for listening! 16

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan - PowerPoint PPT Presentation

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan Schneider Behrang Mohit Chris Dyer Kemal Oflazer Noah A. Smith 1 Gameplan Supersense(Tagging Baselines MT0in0the0Middle Analysis Outlook 3 Supersense(Tagging A

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

Overview and Progress ICANN Singapore Meeting Task Force on Arabic Script IDNs (TF-AIDN) Middle

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Comprehensive Supersense Disambiguation of English Prepositions and Possessives Nathan

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Beyond Locality-Sensitive Hashing Huy L. Nguy Alexandr Andoni 1 Piotr Indyk 2 n 3 Ilya

Wellbeing, living standards, and their Wellbeing, living standards, and their distribution

Machine Learning in PandaRoot GlueX-Panda Workshop G.Washington University, May 2019 Ralf Kliemt

Reasoning for Humans: Clear Thinking in an Uncertain World PHIL 171 Eric Pacuit Department of

Anne Bracy Computer Science Cornell University The slides are the product of many rounds of

A community facility for systems tes1ng at scale The prior

A General Artificial Neural Network Extension for HTK Chao Zhang & Phil Woodland University

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan - PowerPoint PPT Presentation

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan Schneider Behrang Mohit Chris Dyer Kemal Oflazer Noah A. Smith 1 Gameplan Supersense(Tagging Baselines MT0in0the0Middle Analysis Outlook 3 Supersense(Tagging A

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

Overview and Progress ICANN Singapore Meeting Task Force on Arabic Script IDNs (TF-AIDN) Middle

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Comprehensive Supersense Disambiguation of English Prepositions and Possessives Nathan

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Beyond Locality-Sensitive Hashing Huy L. Nguy Alexandr Andoni 1 Piotr Indyk 2 n 3 Ilya

Wellbeing, living standards, and their Wellbeing, living standards, and their distribution

Machine Learning in PandaRoot GlueX-Panda Workshop G.Washington University, May 2019 Ralf Kliemt

Reasoning for Humans: Clear Thinking in an Uncertain World PHIL 171 Eric Pacuit Department of

Anne Bracy Computer Science Cornell University The slides are the product of many rounds of

A community facility for systems tes1ng at scale The prior

A General Artificial Neural Network Extension for HTK Chao Zhang &amp; Phil Woodland University

A General Artificial Neural Network Extension for HTK Chao Zhang & Phil Woodland University