Arabic POS Tagging Results Error Analysis Conclusion Emad - PowerPoint PPT Presentation

Arabic POS Tagging Arabic + POS Tagging Data + Experiments Segmentation POS Tagging Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K¨ ubler Indiana University 1 / 13

The Structure of Arabic Words Arabic POS Tagging Arabic + POS Tagging ◮ An Arabic word may consist of several segments. Data + Experiments ◮ Possible segments: inflectional affixes, the stem, Segmentation clitics POS Tagging ◮ example: WsyktbwnhA (Engl.: and they will write it ): Results ◮ conjunction: w Error Analysis ◮ future particle: s Conclusion ◮ 3rd person imperfect verb prefix: y ◮ imperfect verb: ktb ◮ 3rd person feminine singular object pronoun: hA 2 / 13

The Structure of Arabic Words Arabic POS Tagging Arabic + POS Tagging ◮ An Arabic word may consist of several segments. Data + Experiments ◮ Possible segments: inflectional affixes, the stem, Segmentation clitics POS Tagging ◮ example: WsyktbwnhA (Engl.: and they will write it ): Results ◮ conjunction: w Error Analysis ◮ future particle: s Conclusion ◮ 3rd person imperfect verb prefix: y ◮ imperfect verb: ktb ◮ 3rd person feminine singular object pronoun: hA ◮ POS tag: [CONJ+FUTURE PARTICLE+ IMPERFECT VERB PREFIX+IMPERFECT VERB+ IMPERFECT VERB SUFFIX MASC PLURAL 3RD PERSON+ OBJECT PRONOUN FEM SINGULAR] 2 / 13

Tagging Approaches Arabic POS Tagging Arabic + POS Tagging ◮ whole word tagging: assign complex tag to complete Data + word Experiments Segmentation POS Tagging Results Error Analysis ◮ segment-based tagging: segment first; then assign Conclusion tags to segments 3 / 13

Tagging Approaches Arabic POS Tagging Arabic + POS Tagging ◮ whole word tagging: assign complex tag to complete Data + word Experiments wsyktbwnhA : Segmentation POS Tagging CONJ+FUT+IV3MS+IV+IVSUFF SUBJ:MP MOOD:I+IVSUFF DO:3FS Results Error Analysis ◮ segment-based tagging: segment first; then assign Conclusion tags to segments ◮ w : CONJ ◮ s : FUT ◮ y : IV3MS ◮ ktb : IV ◮ wn : SUBJ:MP MOOD:I ◮ hA : IVSUFF DO:3FS 3 / 13

Tagging Approaches Arabic POS Tagging Arabic + POS Tagging ◮ whole word tagging: assign complex tag to complete Data + word Experiments wsyktbwnhA : Segmentation POS Tagging CONJ+FUT+IV3MS+IV+IVSUFF SUBJ:MP MOOD:I+IVSUFF DO:3FS 993 tags Results Error Analysis ◮ segment-based tagging: segment first; then assign Conclusion tags to segments ◮ w : CONJ ◮ s : FUT ◮ y : IV3MS ◮ ktb : IV ◮ wn : SUBJ:MP MOOD:I ◮ hA : IVSUFF DO:3FS 139 tags 3 / 13

Data Set & Experimental Setup Arabic POS Tagging Arabic + POS Tagging Data + ◮ Penn Arabic Treebank (after-treebank POS files) Experiments Segmentation ◮ P1V3 + P3V1: ca. 500 000 words POS Tagging ◮ non-vocalized version Results Error Analysis ◮ reattached conjunctions, prepositions, pronouns, etc. Conclusion to get text as written ◮ remove null elements: { i$otaraY+(null) / PV+PVSUFF SUBJ:3MS ⇒ { i$otaraY / PV ◮ 5-fold cross validation ◮ evaluation: per-segment accuracy (SAR) + per-word accuracy (WAR) 4 / 13

Memory-Based Segmentation Arabic POS Tagging Arabic + POS Tagging Data + ◮ per character classification: segment-end, Experiments Segmentation no-segment-end POS Tagging ◮ memory-based learning: TiMBL Results Error Analysis ◮ features: focus character, previous 5 characters, and Conclusion following 5 characters, POS tag for word based on whole word tagging ◮ TiMBL parameters: IB, overlap metric, gain ratio weighting, nearest neighbors k = 1 ◮ two rounds: in second round include class from first round 5 / 13

Segmentation Results Arabic POS Tagging Arabic + POS Tagging Data + all words: 98.23% Experiments known words: 99.75% Segmentation unknown words: 82.22% POS Tagging Results Error Analysis Conclusion 6 / 13

Segmentation Results Arabic POS Tagging Arabic + POS Tagging Data + all words: 98.23% Experiments known words: 99.75% Segmentation unknown words: 82.22% POS Tagging Results Error Analysis Conclusion proper noun errors: 33.87% of all errors % unknown words in data: 8.5% 6 / 13

POS Tagging Arabic POS Tagging Arabic + POS Tagging Data + Experiments ◮ memory-based tagger: MBT Segmentation ◮ parameters: Modified Value Difference metric, k = 25 POS Tagging Results ◮ for known words : IGTree, 2 words to left, their POS Error Analysis tags, focus word, its ambitag, 1 right context word, its Conclusion ambitag ◮ for unknown words : IB1, focus word, first 5 + last 3 characters, 1 left context word + its POS tag, 1 right context word + its ambitag ◮ previous decisions are included 7 / 13

POS Tagging Results Arabic POS Tagging Arabic + POS Tagging Data + Experiments Segmentation POS Tagging Results Error Analysis gold standard seg. segmentation-based whole words Conclusion SAR WAR SAR WAR WAR 96.72% 94.91% 94.70% 93.47% 94.74% 8 / 13

Discussion Arabic POS Tagging Arabic + POS ◮ gold standard segmentation: upper bound Tagging Data + ◮ gives best results Experiments Segmentation POS Tagging ◮ no gold standard segmentation available: whole Results Error Analysis words better than automatic segmentation Conclusion ◮ segmentation → more ambiguity per segment ◮ small percentage of unknown words ◮ in segmentation-based tagging, 28% of all errors are results of wrong segementation 9 / 13

Known vs. Unknown Words Arabic POS Tagging Arabic + POS Tagging Data + Experiments Segmentation POS Tagging Results Error Analysis gold std. seg. seg.-based whole words Conclusion known words 95.90% 95.57% 96.61% unknown words 84.25% 71.06% 74.64% 10 / 13

Error Analysis Arabic POS Tagging confusion sets: Arabic + POS Tagging Data + Experiments gold tagger % of errors Segmentation noun adjective 7.88% POS Tagging adjective noun 7.75% Results proper noun noun 9.10% Error Analysis Conclusion noun proper noun 2.51% 11 / 13

Error Analysis Arabic POS Tagging confusion sets: Arabic + POS Tagging Data + Experiments gold tagger % of errors Segmentation noun adjective 7.88% POS Tagging adjective noun 7.75% Results proper noun noun 9.10% Error Analysis Conclusion noun proper noun 2.51% ◮ no clear distinction between nouns and adjectives in Arabic: adjectives behave morphologically like nouns and can be used as nouns ◮ proper nouns are normally standard nouns, and are no marked specifically 11 / 13

Comparison to Habash & Rambow Arabic POS Tagging Arabic + POS Tagging Data + Experiments ◮ whole word tagging Segmentation POS Tagging ◮ then convert to Habash & Rambow tokenization + Results reduced tagset: 15 tags Error Analysis Conclusion H&R ATB1 H&R ATB2 whole word tagger Token. acc. 99.1 – 99.33 POS acc. 98.1 96.5 96.41 12 / 13

Conclusion & Future Work Arabic POS Tagging Arabic + POS Tagging Data + ◮ whole word tagging has higher accuracy than Experiments Segmentation segmentation based tagging POS Tagging ◮ no preprocessing necessary Results ◮ but Penn Arabic Treebank has low percentage of Error Analysis Conclusion unknown words ◮ segmentation quality is bottleneck for improving segmentation-based tagger ◮ need to find more reliable segmentation ◮ will integrate vocalization with segmentation 13 / 13

Arabic POS Tagging Results Error Analysis Conclusion Emad - PowerPoint PPT Presentation

Arabic POS Tagging Arabic + POS Tagging Data + Experiments Segmentation POS Tagging Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana University 1 / 13 The Structure of Arabic Words Arabic

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.

POS tagging CMSC 723 / LING 723 / INST 725 Marine Carpuat POS tagging Sequence labeling with

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan Schneider Behrang Mohit Chris

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Joint Word Segmentation and pos-Tagging using a Single Perceptron Yue Zhang and Stephen Clark

Statistical Natural Language Processing Dr. Besnik Fetahu Overview POS tagging

Constraining h s s at lepton colliders Matthias Schla ff er Weizmann Institute of Science

Exploring the use of target-language information to train the part-of-speech tagger of machine

Statistical Morphological Tagging and Parsing of Korean with an LTAG Grammar Anoop Sarkar and

X bb and Top- Tagging in ATLAS Mike Nelson, University of Oxford HF@LHC, 2017

The Tagging Task Part-of-Speech Tagging Input: the lead paint is unsafe Output: the/Det lead/N

Natural Language Processing with Python CS372: Spring, 20 15 Lecture 12 Categorizing and

CSCI 4152/6509 Natural Language Processing Lab 6: Python NLTK Tutorial 2 Lab Instructor: Dijana

Empirical Methods in Natural Language Processing Lecture 6 Tagging (II): Transformation-Based

Sambuz

Useful Links

Newsletter

Mail Us

Arabic POS Tagging Results Error Analysis Conclusion Emad - PowerPoint PPT Presentation

Arabic POS Tagging Arabic + POS Tagging Data + Experiments Segmentation POS Tagging Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana University 1 / 13 The Structure of Arabic Words Arabic

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning &amp; H.

POS tagging CMSC 723 / LING 723 / INST 725 Marine Carpuat POS tagging Sequence labeling with

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

Supersense Tagging for Arabic: The MT-in-the-Middle Attack Nathan Schneider Behrang Mohit Chris

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Joint Word Segmentation and pos-Tagging using a Single Perceptron Yue Zhang and Stephen Clark

Statistical Natural Language Processing Dr. Besnik Fetahu Overview POS tagging

Constraining h s s at lepton colliders Matthias Schla ff er Weizmann Institute of Science

Exploring the use of target-language information to train the part-of-speech tagger of machine

Statistical Morphological Tagging and Parsing of Korean with an LTAG Grammar Anoop Sarkar and

X bb and Top- Tagging in ATLAS Mike Nelson, University of Oxford HF@LHC, 2017

The Tagging Task Part-of-Speech Tagging Input: the lead paint is unsafe Output: the/Det lead/N

Natural Language Processing with Python CS372: Spring, 20 15 Lecture 12 Categorizing and

CSCI 4152/6509 Natural Language Processing Lab 6: Python NLTK Tutorial 2 Lab Instructor: Dijana

Empirical Methods in Natural Language Processing Lecture 6 Tagging (II): Transformation-Based

Sambuz

Useful Links

Newsletter

Mail Us

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.