LEXICALIZED PARSING FOR DIFFERENT DOMAINS
Laura Rimell and Stephen Clark
10.12.2009
Daniel C. Müller
LEXICALIZED PARSING FOR DIFFERENT DOMAINS Laura Rimell and Stephen - - PowerPoint PPT Presentation
Daniel C. Mller LEXICALIZED PARSING FOR DIFFERENT DOMAINS Laura Rimell and Stephen Clark 10.12.2009 Hypothesis 2 parser adaption in the context of lexicalized grammar according to two different domains Daniel C. Mller
10.12.2009
Daniel C. Müller
10.12.2009 Daniel C. Müller -
2
according to two different domains
Biomedical domain Questions of question answering
10.12.2009
3
Daniel C. Müller -
POS-Tagging based on Penn Tree Bank Combinatory Categorial Grammar
10.12.2009
4
Daniel C. Müller -
POS-Tagging based on Penn Tree Bank
POS Tag:
50 grammatical labels indicating part of speech
Each word
10.12.2009
5
Daniel C. Müller -
POS-Tagging based on Penn Tree Bank Combinatory Categorial Grammar
lexical categorization (super-tagger)
425 categories Each word
Containing subcategorial information Complex categories like (S\NP)/NP means:
10.12.2009
6
Daniel C. Müller -
Example
Biomedical domain Talin|NN perhaps|RB acts|VBZ as|IN a|DT linkage|NN protein|NN .|. NP (S\NP)/(S\NP) (S[dcl]\NP)/PP PP/NP NP[nb]/N N/N N . Question domain What|WDT king|NN signed|VBD the|DT Magna|NNP Carta|NNP ?|. (S[wq]/(S[dcl]\NP))/N N (S[dcl]\NP)/NP NP[nb]/N N/N N . POS Tag lexical category
10.12.2009
7
Daniel C. Müller -
POS-Tagging based on Penn Tree Bank Combinatory Categorial Grammar
lexical categorization (super-tagger) derivation (hierarchy)
Lexicalized categories + combinatory rules
Viterbi
10.12.2009
8
Daniel C. Müller -
Example
10.12.2009
9
Daniel C. Müller -
creating new training data
better POS tagging
reduce annotation overhead
10.12.2009
10
Daniel C. Müller -
Training resources Baseline Wall Street Journal Sections 02-21 of CCGbank Biomedical domain POS tagger: gold-standard POS tags from GENIA Lexical categories: rst1,000 sentences of GENIA parser evaluation: BioInfer Evaluation set: Pyysalo et al. (2007b) Question domain
Questions beginning with the word What, from the TREC 9-12 competitions:
manually POS tagged & annotated with lexical categories
10.12.2009
11
Daniel C. Müller -
Results
POS-Tagger
10.12.2009
12
Daniel C. Müller -
Results
Supertagger
10.12.2009
13
Daniel C. Müller -
Results
Parser evaluation
10.12.2009
14
Daniel C. Müller -
Comparing to WSJ:
Biomedical domain:
Question domain:
10.12.2009
15
Daniel C. Müller -
POS tagging
Biomedical domain:
nouns and adjectives (801 NN + 268 JJ errors)
Question domain:
wh-determiners (129 errors)
10.12.2009
16
Daniel C. Müller -
POS tagging
Biomedical domain:
nouns and adjectives (801 NN + 268 JJ errors)
Question domain:
wh-determiners (129 errors)
10.12.2009
17
Daniel C. Müller -
Syntactic differences
Unknown POS n-gram rate
10.12.2009
18
Daniel C. Müller -
Syntactic differences
Unknown POS n-gram rate Number of 20 most frequent POS n-grams
10.12.2009
19
Daniel C. Müller -
Syntactic differences
Unknown POS n-gram rate Number of 20 most frequent POS n-grams POS Trigrams
Biomedical domain:
Domination of NPs and PPs
Question domain:
Beginning with WP VBZ like What is Ending with VB .
10.12.2009
20
Daniel C. Müller -
Syntactic differences
Unknown POS n-gram rate Number of 20 most frequent POS n-grams POS Trigrams Number of rare or unseen lexical categories
10.12.2009
21
Daniel C. Müller -
Biomedical domain
need for accurate parsing long and difficult sentences many POS tag errors
Question domain
uniform sentences less related syntax
10.12.2009
22
Daniel C. Müller -
Laura Rimell, Stephen Clark. 2008. Adapting a
Julia Hockenmaiers. 2007. Expressive Grammar
Julia Hockenmaiers. 2005 CCGBank Users Manual
10.12.2009
23
Daniel C. Müller -