Better Arabic Parsing
Baselines, Evaluations, and Analysis Spence Green and Christopher D. Manning Stanford University August 27, 2010
Better Arabic Parsing Baselines, Evaluations, and Analysis Spence - - PowerPoint PPT Presentation
Better Arabic Parsing Baselines, Evaluations, and Analysis Spence Green and Christopher D. Manning Stanford University August 27, 2010 Motivation Syntax and Annotation Multilingual Parsing Grammar Development Arabic Experiments Common
Baselines, Evaluations, and Analysis Spence Green and Christopher D. Manning Stanford University August 27, 2010
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
Is language X “harder” to parse than language Y?
2 / 32
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
Is language X “harder” to parse than language Y?
◮ Morphologically–rich X
2 / 32
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
Is language X “harder” to parse than language Y?
◮ Morphologically–rich X
Is treebank X “better/worse” than treebank Y?
2 / 32
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
Is language X “harder” to parse than language Y?
◮ Morphologically–rich X
Is treebank X “better/worse” than treebank Y? Does feature Z “help more” for language X than Y?
2 / 32
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
Is language X “harder” to parse than language Y?
◮ Morphologically–rich X
Is treebank X “better/worse” than treebank Y? Does feature Z “help more” for language X than Y?
◮ Lexicalization ◮ Morphological annotations ◮ Markovization ◮ etc.
2 / 32
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
English Chinese Bulgarian German French Italian Arabic Arabic – This Paper
90.1 83.7 81.6 80.1 77.9 75.6 75.8 81.1 Evalb F1 - All Sentence Lengths
(Petrov, 2009)
3 / 32
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
Annotation style similar to PTB Relatively little segmentation (cf. Chinese) Richer morphology (cf. English) More syntactic ambiguity (unvocalized)
4 / 32
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
Parts 1–3 (not including part 3, v3.2) Newswire only
◮ Agence France Presse, Al–Hayat, Al–Nahar
Corpus/experimental characteristics
◮ 23k trees ◮ 740k tokens ◮ Shortened “Bies” POS tags ◮ Split: 2005 JHU workshop
5 / 32
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
Diglossia: “Arabic” − → MSA
6 / 32
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible
6 / 32
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible Devocalization
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible Devocalization
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible Devocalization
◮ ATB uses clitic segmentation
6 / 32
Motivation Syntax and Annotation Grammar Development Experiments Multilingual Parsing Arabic
Diglossia: “Arabic” − → MSA Typology: VSO — VOS, SVO, VO also possible Devocalization
◮ ATB uses clitic segmentation
6 / 32
Motivation Syntax and Annotation Grammar Development Experiments Syntactic Ambiguity Analysis and Evaluation
This talk:
◮ Devocalization ◮ Discourse level coordination ambiguity
Many other types:
◮ Adjectives / Adjective phrases ◮ Process nominals — maSdar ◮ Attachment in annexation constructs (Gabbard and Kulick, 2008)
7 / 32
Motivation Syntax and Annotation Grammar Development Experiments Syntactic Ambiguity Analysis and Evaluation
8 / 32
Motivation Syntax and Annotation Grammar Development Experiments Syntactic Ambiguity Analysis and Evaluation
9 / 32
Motivation Syntax and Annotation Grammar Development Experiments Syntactic Ambiguity Analysis and Evaluation
VP VBD
ﺖﻓﺎﺿﺍ
she added S VP PUNC “ VBP
ﻥﺍ
Indeed NP NN
ﻡﺍﺪﺻ
Saddam . . .
Reference
VP VBD
ﺖﻓﺎﺿﺍ
she added SBAR PUNC “ IN
ﻥﺍ
Indeed NP NN
ﻡﺍﺪﺻ
Saddam . . .
Stanford
10 / 32
Motivation Syntax and Annotation Grammar Development Experiments Syntactic Ambiguity Analysis and Evaluation
◮ S < S in 27.0% of dev set trees ◮ NP < CC in 38.7% of dev set trees
11 / 32
Motivation Syntax and Annotation Grammar Development Experiments Syntactic Ambiguity Analysis and Evaluation
Leaf Ancestor metric reference chains (Berkeley)
◮ score ∈ [0, 1]
12 / 32
Motivation Syntax and Annotation Grammar Development Experiments Syntactic Ambiguity Analysis and Evaluation
Compared ATB gross corpus statistics to:
◮ Chinese — CTB6 ◮ English — WSJ sect. 2-23 ◮ German — Negra
13 / 32
Motivation Syntax and Annotation Grammar Development Experiments Syntactic Ambiguity Analysis and Evaluation
Compared ATB gross corpus statistics to:
◮ Chinese — CTB6 ◮ English — WSJ sect. 2-23 ◮ German — Negra
The ATB isn’t that unusual!
13 / 32
Motivation Syntax and Annotation Grammar Development Experiments Syntactic Ambiguity Analysis and Evaluation
ATB CTB6 Negra WSJ 1.04 1.18 0.46 0.82 Non-terminal / Terminal Ratio
14 / 32
Motivation Syntax and Annotation Grammar Development Experiments Syntactic Ambiguity Analysis and Evaluation
ATB CTB6 Negra WSJ 1.04 1.18 0.46 0.82 Non-terminal / Terminal Ratio 16.8% 22.2% 30.5% 13.2% OOV Rate
14 / 32
Motivation Syntax and Annotation Grammar Development Experiments Syntactic Ambiguity Analysis and Evaluation
15 / 32
Motivation Syntax and Annotation Grammar Development Experiments Syntactic Ambiguity Analysis and Evaluation
40 words is not a sufficient limit for evaluation!
15 / 32
Motivation Syntax and Annotation Grammar Development Experiments Features
Klein and Manning (2003)–style state splits
◮ Human–interpretable ◮ Features can inform treebank revision
16 / 32
Motivation Syntax and Annotation Grammar Development Experiments Features
Klein and Manning (2003)–style state splits
◮ Human–interpretable ◮ Features can inform treebank revision
16 / 32
Motivation Syntax and Annotation Grammar Development Experiments Features
Klein and Manning (2003)–style state splits
◮ Human–interpretable ◮ Features can inform treebank revision
16 / 32
Motivation Syntax and Annotation Grammar Development Experiments Features
S—hasVerb NP-SBJ NN NP SBAR—hasVerb IN S—hasVerb NP-TPC VP—hasVerb VB . . .
17 / 32
Motivation Syntax and Annotation Grammar Development Experiments Features
S—hasVerb NP-SBJ NN NP SBAR—hasVerb IN S—hasVerb NP-TPC VP—hasVerb VB . . .
◮ +1.18 dev set improvement ◮ 16.1% of dev set trees lack verbs
17 / 32
Motivation Syntax and Annotation Grammar Development Experiments Features
S S CC—S S NP NP NN CC—noun NN . . .
18 / 32
Motivation Syntax and Annotation Grammar Development Experiments Features
S S CC—S S NP NP NN CC—noun NN . . .
◮ +0.21 dev set improvement ◮ POS equivalence classes for verb, noun, adjective
18 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
Evaluation on lengths ≤ 70 Pre–processing makes a huge difference
◮ Maintained non–terminal / terminal ratio vv. prior
work Models:
◮ Berkeley ◮ Bikel — with pre–tagged input ◮ Stanford — with the manual grammar
19 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
Evalb (development set) 75 80 85 5000 10000 15000 Berkeley Stanford Bikel training trees F1
20 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
up to 70 All
21 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
Pipeline: MADA + Stanford parser
22 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
Pipeline: MADA + Stanford parser Lattice parsing – effective for Hebrew (Goldberg and Tsarfaty, 2008)
22 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
Pipeline: MADA + Stanford parser Lattice parsing – effective for Hebrew (Goldberg and Tsarfaty, 2008) Evaluation on gold (segmented) lengths ≤ 70
22 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
Pipeline: MADA + Stanford parser Lattice parsing – effective for Hebrew (Goldberg and Tsarfaty, 2008) Evaluation on gold (segmented) lengths ≤ 70 Metric: Evalb without whitespace
◮ Requires exact character yield (Tsarfaty, 2006)
22 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
ABC ABCD EFG EF G D A BC BCD
23 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
ABC EFG EF G D A BC BCD A ABC BC ABCD
24 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
ABC EFG EF G D A BC BCD A ABC BC BCD D ABCD EFG EF G ABCD
25 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
Gold Pipeline Lattice Lattice+LM
26 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
Gold Pipeline Lattice Lattice+LM
27 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
Gold Pipeline Lattice Lattice+LM
27 / 32
English Chinese Bulgarian German French Italian Arabic Arabic – This Paper
90.1 83.7 81.6 80.1 77.9 75.6 75.8 81.1 Evalb F1 - All Sentence Lengths
28 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
up to 70 All
30 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
31 / 32
Motivation Syntax and Annotation Grammar Development Experiments Gold Segmentation Raw Text
95% confidence intervals (type–level):
◮ Arabic: [ 17.4%, 34.6% ] ◮ English: [ 8.79%, 23.3% ]
32 / 32