Neutralizing Linguistically Problematic Annotations i U i d D d P i E l ti in Unsupervised Dependency Parsing Evaluation
Roy Schwartz1, Omri Abend1, Roi Reichart2 and Ari Rappoport1
1The Hebrew University, 2MIT
In proceedings of ACL 2011
Neutralizing Linguistically Problematic Annotations i U in - - PowerPoint PPT Presentation
Neutralizing Linguistically Problematic Annotations i U in Unsupervised Dependency Parsing Evaluation i d D d P i E l ti Roy Schwartz 1 , Omri Abend 1 , Roi Reichart 2 and Ari Rappoport 1 1 The Hebrew University, 2 MIT In proceedings of
Neutralizing Linguistically Problematic Annotations i U i d D d P i E l ti in Unsupervised Dependency Parsing Evaluation
Roy Schwartz1, Omri Abend1, Roi Reichart2 and Ari Rappoport1
1The Hebrew University, 2MIT
In proceedings of ACL 2011
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 2
Dependency Parsing
we want to play ROOT
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 3
Related Work
ld l – McDonald et al., 2005 – Nivre et al., 2006 – Smith and Eisner, 2008 – Zhang and Clark, 2008 – Martins et al., 2009 – Goldberg and Elhadad, 2010 – inter alia
– Klein and Manning, 2004 – Cohen and Smith, 2009 dd l – Headden et al., 2009 – Blunsom and Cohn, 2010 – Spitkovsky et al., 2010 – inter alia
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 4
Unsupervised Dependency Parsing Evaluation
– Ratio of correct directed edges Ratio of correct directed edges
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 5
Unsupervised Dependency Parsing Evaluation
p
– Gold Std: PRP VBP TO VB ROOT
(we) (want) (to) (play) (we) (want) (to) (play)
– Score: 2/4 / PRP VBP TO VB ROOT
(we) (want) (to) (play)
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 6
gold standard annotation
some structures is g Linguistically Problematic
– I.e., not under consensus
E l
– Infinitive Verbs
to play
(Collins, 1999) (Bosco and Lombardo, 2004) (Johansson and Nugues, 2007)
– Prepositional Phrases
in Rome
(Johansson and Nugues, 2007) (Y d d M t t 2003)
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 7
(Yamada and Matsumoto, 2003)
j y p
– Confined to 2–3 words only – Often, alternative annotations differ in the direction of some edge The controversy only relates to the internal structure – The controversy only relates to the internal structure
to play want chess
– 42.9% of the tokens in PTB WSJ participate in at least one problematic structure structure
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 8
g ( g g ) from constituency parsing using head percolation rules
currently in use for the same task
– Used in e.g., (Berg‐Kirkpatrick et al., 2010; Spitkovsky et al., 2010)
– Used in e g (Cohen and Smith 2009; Gillenwater et al 2010)
14 4%
– Used in e.g., (Cohen and Smith, 2009; Gillenwater et al., 2010)
– Used in e.g., the CoNLL shared task 2007, (Blunsom and Cohn, 2010)
14.4% Diff.
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 9
(Collins, 1999) (Yamada and Matsumoto 2003) (Yamada and Matsumoto, 2003) (Johansson and Nugues, 2007)
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 10
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 11
Induced Parameters
to play
< 1% Gold Standard
Modified Parameters X 3 leading
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 12
g Parsers
Model Original Modified Modified ‐ Original km04 34.3 43.6 9.3 cs09 39.7 54.4 14.7 saj10 41.3 54 12.7
cs09 Cohen and Smith, 2009
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 13
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 14
Undirected Evaluation
q annotations of problematic structures
the direction of some edge
directions directions
How about undirected evaluation?
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 15
Undirected Evaluation
PRP VBP TO VB ROOT
(we) (want) (to) (play)
Induced parse, with a flipped edge
PRP VBP TO VB ROOT
(we) (want) (to) (play)
No head Two heads
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 16
No head Two heads
Undirected Evaluation
PRP VBP TO VB ROOT
(we) (want) (to) (play)
undirected score 3/4 (75%) This is the minimal
modification!
Induced parse, with a flipped edge
☺ ☺
modification! PRP VBP TO VB ROOT
(we) (want) (to) (play)
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 17
y ff g pp g
Direction (NED)
– A simple extension of the undirected evaluation measure – Ignores edge direction flips Ignores edge direction flips
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 18
want to play p y
Gold Standard
want we want want to play to play to play
Induced parse I (agrees with gold std.) Induced parse II (linguistically plausible) Induced parse III (linguistically implausible)
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 19
,
– X is a correct parent of Y if:
Attachment Undirected
want want to play
Gold Standard
to play
linguistically plausible parse
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 20
Gold Standard
linguistically plausible parse
Difference Between Gold Standards
14 16 10 12 14
Attach.
4 6 8
Undir. NED
2 4
standards
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 21
Sensitivity to Parameter modification
20 10 15
Attach.
5
Undir. NED
km04 cs09 saj10
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 22
p p p
– Choosing the “wrong” (plausible) annotation should not be considered an error – Use NED! Use NED!
– They get the correct annotation as training input
N hl N b d b d d h
– Better suited than using undirected evaluation measure
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 23
g
Find a more fine grained measure
– Evaluating Dependency Parsing: Robust and Heuristics‐ Free Cross‐Annotation Evaluation (Tsarfaty et al., to appear in EMNLP 2011)
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 24
p p
– Gold Standards – 3 used (~15% difference between them) – Current Parsers – very sensitive to alternative (plausible) annotations. Minor modifications result in ~9–15% performance “gain” Minor modifications result in 9 15% performance gain – Undirected Evaluation – does not solve this problem
– Simple and intuitive – Reduces difference between different gold standards to ~5% – Reduces difference between different gold standards to ~5% – Reduces undesired performance “gain” (~1–4%) – Still indicative of quality difference
d ’ l d ( )
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 25
gg p g g y used attachment score Many thanks to Many thanks to
http://www.cs.huji.ac.il/~roys02/software/ned.html
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation @ Schwartz et al. 26
Phil Blunsom