Better Arabic Parsing Baselines, Evaluations, and Analysis Spence - PowerPoint PPT Presentation

Better Arabic Parsing Baselines, Evaluations, and Analysis Spence Green and Christopher D. Manning Stanford University August 27, 2010

Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Common Multilingual Parsing Questions... Is language X “harder” to parse than language Y? 2 / 32

Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Common Multilingual Parsing Questions... Is language X “harder” to parse than language Y? ◮ Morphologically–rich X 2 / 32

Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Common Multilingual Parsing Questions... Is language X “harder” to parse than language Y? ◮ Morphologically–rich X Is treebank X “better/worse” than treebank Y? 2 / 32

Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Common Multilingual Parsing Questions... Is language X “harder” to parse than language Y? ◮ Morphologically–rich X Is treebank X “better/worse” than treebank Y? Does feature Z “help more” for language X than Y? 2 / 32

Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Common Multilingual Parsing Questions... Is language X “harder” to parse than language Y? ◮ Morphologically–rich X Is treebank X “better/worse” than treebank Y? Does feature Z “help more” for language X than Y? ◮ Lexicalization ◮ Morphological annotations ◮ Markovization ◮ etc. 2 / 32

Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments “Underperformance” Relative to English Arabic – This Paper 81.1 Arabic 75.8 Italian 75.6 French 77.9 German 80.1 Bulgarian 81.6 Chinese 83.7 English 90.1 Evalb F1 - All Sentence Lengths (Petrov, 2009) 3 / 32

Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Why Arabic / Penn Arabic Treebank (ATB)? Annotation style similar to PTB Relatively little segmentation (cf. Chinese) Richer morphology (cf. English) More syntactic ambiguity (unvocalized) 4 / 32

Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments ATB Details Parts 1–3 (not including part 3, v3.2) Newswire only ◮ Agence France Presse, Al–Hayat, Al–Nahar Corpus/experimental characteristics ◮ 23k trees ◮ 740k tokens ◮ Shortened “Bies” POS tags ◮ Split: 2005 JHU workshop 5 / 32

Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Arabic Preliminaries Diglossia: “Arabic” − → MSA 6 / 32

Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Arabic Preliminaries Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible 6 / 32

� � Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Arabic Preliminaries Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible Devocalization � � � � � � � � � � � � � � � � � � � � � 6 / 32

� � Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Arabic Preliminaries Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible Devocalization � � � � � � � � � � � � � � � � � � � � � �� 6 / 32

� � Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Arabic Preliminaries Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible Devocalization � � � � � � � � � � � � � � � � � � � � � �� Segmentation: an analyst’s choice! ◮ ATB uses clitic segmentation 6 / 32

� � Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Arabic Preliminaries Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible Devocalization � � � � � � � � � � � � � � � � � � � � � �� Segmentation: an analyst’s choice! ◮ ATB uses clitic segmentation �� + + 6 / 32

Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Syntactic Ambiguity in Arabic This talk: ◮ Devocalization ◮ Discourse level coordination ambiguity Many other types: ◮ Adjectives / Adjective phrases ◮ Process nominals — maSdar ◮ Attachment in annexation constructs (Gabbard and Kulick, 2008) 7 / 32

Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Devocalization: Inna and her Sisters POS Head Of � � � inna “indeed” VBP VP � � � anna “that” IN SBAR � � � in “if” IN SBAR � � � an “to” IN SBAR 8 / 32

Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Devocalization: Inna and her Sisters POS Head Of �� inna “indeed” VBP VP �� anna “that” IN SBAR �� in “if” IN SBAR �� an “to” IN SBAR 9 / 32

Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Devocalization: Inna and her Sisters VP VP VBD S VBD SBAR VP ﺖﻓﺎﺿﺍ ﺖﻓﺎﺿﺍ PUNC IN NP . . . she added she added PUNC VBP NP “ NN ﻥﺍ . . . “ NN Indeed ﻥﺍ ﻡﺍﺪﺻ Indeed Saddam ﻡﺍﺪﺻ Saddam Reference Stanford 10 / 32

Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Discourse–level Coordination Ambiguity S S CC S CC S NP VP VP NP PP NP ﻭ ﻭ and and ◮ S < S in 27.0% of dev set trees ◮ NP < CC in 38.7% of dev set trees 11 / 32

Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Discourse–level Coordination Ambiguity Leaf Ancestor metric reference chains (Berkeley) ◮ score ∈ [ 0 , 1 ] Score # Gold 0.696 34 S < S < VP < NP < PRP 0.756 170 S < VP < NP < CC 0.768 31 S < S < VP < S < VP < PP < IN 0.796 86 S < S < VP < SBAR < IN 0.804 52 S < S < NP < NN 12 / 32

Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Treebank Comparison Compared ATB gross corpus statistics to: ◮ Chinese — CTB6 ◮ English — WSJ sect. 2-23 ◮ German — Negra 13 / 32

Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Treebank Comparison Compared ATB gross corpus statistics to: ◮ Chinese — CTB6 ◮ English — WSJ sect. 2-23 ◮ German — Negra The ATB isn’t that unusual! 13 / 32

Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Corpus Features in Favor of the ATB WSJ 0.82 Negra 0.46 CTB6 1.18 ATB 1.04 Non-terminal / Terminal Ratio 14 / 32

Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Corpus Features in Favor of the ATB WSJ 0.82 13.2% Negra 0.46 30.5% CTB6 1.18 22.2% ATB 1.04 16.8% Non-terminal / Terminal Ratio OOV Rate 14 / 32

Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Sentence Length Negatively Affects Parsing WSJ 23.8 Negra 17.2 CTB6 27.7 ATB 31.5 Avg. Sentence Length 15 / 32

Motivation Syntax and Annotation Syntactic Ambiguity Grammar Development Analysis and Evaluation Experiments Sentence Length Negatively Affects Parsing WSJ 23.8 Negra 17.2 CTB6 27.7 ATB 31.5 Avg. Sentence Length 40 words is not a sufficient limit for evaluation! 15 / 32

Motivation Syntax and Annotation Features Grammar Development Experiments Developing a Manually Annotated Grammar Klein and Manning (2003)–style state splits ◮ Human–interpretable ◮ Features can inform treebank revision 16 / 32

Motivation Syntax and Annotation Features Grammar Development Experiments Developing a Manually Annotated Grammar Klein and Manning (2003)–style state splits ◮ Human–interpretable ◮ Features can inform treebank revision NP NP—idafa NN NP NN NP—idafa NN NP NN NP DTNN DTNN 16 / 32

Motivation Syntax and Annotation Features Grammar Development Experiments Developing a Manually Annotated Grammar Klein and Manning (2003)–style state splits ◮ Human–interpretable ◮ Features can inform treebank revision NP NP—idafa NN NP NN NP—idafa NN NP NN NP DTNN DTNN Alternative: automatic splits (Berkeley parser) 16 / 32

Motivation Syntax and Annotation Features Grammar Development Experiments Feature: markContainsVerb S—hasVerb NP-SBJ SBAR—hasVerb NN NP IN S—hasVerb NP-TPC VP—hasVerb VB . . . 17 / 32

Better Arabic Parsing Baselines, Evaluations, and Analysis Spence - PowerPoint PPT Presentation

Better Arabic Parsing Baselines, Evaluations, and Analysis Spence Green and Christopher D. Manning Stanford University August 27, 2010 Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Common

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

BUDDHIST BIRTH-STORIES ARABIC I ARABIC II TIBETAN CHINESE SANSCRIT II SANSCRIT I BUDDHIST

Overview and Progress ICANN Singapore Meeting Task Force on Arabic Script IDNs (TF-AIDN) Middle

Expressing I`rab: The Presentation of Arabic Grammatical Analysis Expressing I`rab: The

Dialect contact and change in an Arabic morpheme: The feminine ending in Jordan and Palestine

Aspectual object marking in Libyan Arabic Kersti Brjars, Khawla Ghadgoud & John Payne The

Arabic Language Challenges Walid Magdy This lecture is not About Arabic language technologies

Equivalence with context-free grammars I Language context free recognized by pushdown automata

Context Sensitive Grammar Habeeb and Alvin ATC Seminar 30 NOV 2018 Habeeb and Alvin CSG 30

Unwrapped Bob Ascah, Principal Abpolecon.ca 2020-10-28 Abpolecon.ca ATB Financial Average

6th Annual AltaCorp / ATB Institutional Investor Conference J a n u a r y 1 0 , 2 0 1 8 1

Mutually Beneficial Industry Partnerships Casey K. Sacks, Ph.D. Deputy Assistant Secretary for

Probabilistic & Unsupervised Learning Introduction and Foundations Maneesh Sahani

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

A benchmark study for CFD solvers: simulation of air flow in livestock husbandry Alfonso Caiazzo

Sambuz

Useful Links

Newsletter

Mail Us

Better Arabic Parsing Baselines, Evaluations, and Analysis Spence - PowerPoint PPT Presentation

Better Arabic Parsing Baselines, Evaluations, and Analysis Spence Green and Christopher D. Manning Stanford University August 27, 2010 Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Common

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

BUDDHIST BIRTH-STORIES ARABIC I ARABIC II TIBETAN CHINESE SANSCRIT II SANSCRIT I BUDDHIST

Overview and Progress ICANN Singapore Meeting Task Force on Arabic Script IDNs (TF-AIDN) Middle

Expressing I`rab: The Presentation of Arabic Grammatical Analysis Expressing I`rab: The

Dialect contact and change in an Arabic morpheme: The feminine ending in Jordan and Palestine

Aspectual object marking in Libyan Arabic Kersti Brjars, Khawla Ghadgoud &amp; John Payne The

Arabic Language Challenges Walid Magdy This lecture is not About Arabic language technologies

Equivalence with context-free grammars I Language context free recognized by pushdown automata

Context Sensitive Grammar Habeeb and Alvin ATC Seminar 30 NOV 2018 Habeeb and Alvin CSG 30

Unwrapped Bob Ascah, Principal Abpolecon.ca 2020-10-28 Abpolecon.ca ATB Financial Average

6th Annual AltaCorp / ATB Institutional Investor Conference J a n u a r y 1 0 , 2 0 1 8 1

Mutually Beneficial Industry Partnerships Casey K. Sacks, Ph.D. Deputy Assistant Secretary for

Probabilistic &amp; Unsupervised Learning Introduction and Foundations Maneesh Sahani

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

A benchmark study for CFD solvers: simulation of air flow in livestock husbandry Alfonso Caiazzo

Sambuz

Useful Links

Newsletter

Mail Us

Aspectual object marking in Libyan Arabic Kersti Brjars, Khawla Ghadgoud & John Payne The

Probabilistic & Unsupervised Learning Introduction and Foundations Maneesh Sahani