doff s e g each verb can have its own distribution of
play

[ ] doff : s E.g., each verb can have its own distribution of - PDF document

Soft Selection Bilexical Grammars doff a cap and a hat O(n 3 ) Probabilistic Parser sombrero shirt sink Adjuncts too: clothe Jason Eisner doffed his cap to her about University of Pennsylvania at her ... for her IWPT - 1997 Jason


  1. Soft Selection Bilexical Grammars doff a cap and a hat O(n 3 ) Probabilistic Parser sombrero shirt sink Adjuncts too: clothe Jason Eisner doffed his cap to her about University of Pennsylvania at her ... for her IWPT - 1997 Jason Eisner (U. Penn) 1 Jason Eisner (U. Penn) 2 monkeys doffing their hats Lexicalized Grammars From lexical to bilexical Lafferty et al. 92, Charniak 95, Alshawi 96, Collins 96, Eisner 96, l Goodman 97 doff : ___ NP doff : (S\NP)/NP Also see Magerman 94, Ratnaparkhi 97, etc. l S → NP doff NP S l Rules mention two words NP ↓ VP [ ] doff : s E.g., each verb can have its own distribution of arguments NP ↓ obj doff subj L(np) R(np) l Goal: No parsing performance penalty doff Alas, with standard chart parser: Rules are specialized for individual words nonlexical O(n 3 ) lexical O(n 5 ) other methods give O(n 4 ) or O(n 3 ) (or are implicit in lexical entries) bilexical O(n 5 ) Jason Eisner (U. Penn) 3 Jason Eisner (U. Penn) 4 Simplified Formalism (1) (save these gewgaws for later) ROOT ROOT wore wore 1 /Sent agent patient tmp-mod goal today cat stovepipe to cat 2 stovepipe to today mod det det The in a striped house obj house 1 /Noun The in a striped 1 /Adj hat our hat our t e d the the The cat in the hat wore a striped stovepipe to our house today. Jason Eisner (U. Penn) 5 Jason Eisner (U. Penn) 6

  2. Simplified Formalism (2) Weighting the Grammar ROOT Transitive verb. doff: right DFA wore accepts: Noun Adv* cat stovepipe to today likes: hat nicely now (e.g., “[Bentley] doffed [his hat] [nicely] [just now]”) hates: sink countably (e.g., #“Bentley doffed [ the sink] [countably]”) in The a striped house hat our the Need a flexible mechanism to score the possible sequences of dependents. doff nicely(2) every lexical entry wore: left DFA right DFA hat(8) hat nicely now: 8+2+2+3 = 15 lists 2 idiosyncratic DFAs. These accept sink countably: 1+0+3 = 4 in(-5) now(2) -4 3 dependent sequences sink(1) the word likes. cat stovepipe to today countably(0) Jason Eisner (U. Penn) 7 Jason Eisner (U. Penn) 8 Why CKY is slow Generic Chart Parsing (1) 1. visiting relatives is boring l interchangeable analyses have same signature 2. visiting relatives wear funny hats l “analysis” = tree or dotted tree or ... 3. visiting relatives, we got bored and stole their funny hats ... [cap spending at $300 million] ... visiting relatives: NP(visiting), NP(relatives), AdvP, ... NP NP VP CFG says that all NPs are interchangeable VP [score: 5] So we only have to use generic or best NP. [score: 2] NP [score: 17] [score: 12] But bilexical grammar disagrees: [score: 4] e.g., NP(visiting) is a poor subject for wear l if ≤ S signatures, keep ≤ S analyses per substring We must try combining each analysis w/ context Jason Eisner (U. Penn) 9 Jason Eisner (U. Penn) 10 Generic Chart Parsing (2) Headed constituents ... for each of the O( n 2 ) substrings, ... have too many signatures. How bad is Θ Θ (n 3 S 2 c)? Θ Θ for each of O( n ) ways of splitting it, for each of ≤ S analyses of first half for each of ≤ S analyses of second half, O(n 3 S 2 c) For unheaded constituents, S is constant: NP, VP ... for each of ≤ c ways of combining them: (similarly for dotted trees). So Θ (n 3 ). combine, & add result to chart if best But when different heads ⇒ different signatures, [cap spending] + [at $300 million] = [[cap spending] [at $300 million]] the average substring has Θ (n) possible heads and S= Θ (n) possible signatures. So Θ (n 5 ). ≤ S analyses ≤ S analyses ≤ cS 2 analyses of which we keep ≤ S Jason Eisner (U. Penn) 11 Jason Eisner (U. Penn) 12

  3. Forget heads - think hats! Spans vs. constituents Two kinds of substring. Solution: » Constituent of the tree: links to the rest Don’t assemble the parse from constituents. only through its head. Assemble it from spans instead. ROOT wore The cat in the hat wore a stovepipe. ROOT cat stovepipe The in a » Span of the tree: links to the rest hat only through its endwords. the The cat in the hat wore a stovepipe. ROOT The cat in the hat wore a stovepipe. ROOT Jason Eisner (U. Penn) 13 Jason Eisner (U. Penn) 14 Decomposing a tree into spans Maintaining weights The cat in the hat wore a stovepipe. ROOT x y x y Seed chart w/ word pairs , , x y Step of the algorithm: + The cat cat in the hat wore a stovepipe. ROOT a ... b ... c a ... b b ... c + = We can add a ... b ... c an arc only if + wore a stovepipe. ROOT cat in the hat wore a, c are both a ... b ... c parentless. a ... b ... c a ... b b ... c cat in + weight( ) = weight( ) + weight( ) in the hat wore + weight of c arc from a’s right DFA state + weights of stopping in b’s left and right states in the hat + hat wore Jason Eisner (U. Penn) 15 Jason Eisner (U. Penn) 16 Analysis Improvement Algorithm is O(n 3 S 2 ) time, O(n 2 S ) space. What is S? Can reduce S from O(t 2 ) to O(t) a ... b b ... c = a ... b ... c + a ... b b ... c = a ... b ... c + Where: »b gets a parent from exactly one side likewise for b ’s right automaton »Neither a nor c previously had a parent state of b ’s left automaton The halt-weight for each half is »a’s right DFA accepts c; b’s DFAs can halt tells us weight of halting independent of the other half. a ... b Signature of has to specify parental status & DFA state of a and b Add every span to both left chart & right chart ∴ S = O(t 2 ) where t is the maximum # states of any DFA a ... b b ... c Above, we draw from left chart, from right chart a ... b Copy of in left chart has halt weight for b already added S independent of n because all of a substring’s analyses so its signature needn’t mention the state of b ’s automaton are headed in the same place - at the ends! Jason Eisner (U. Penn) 17 Jason Eisner (U. Penn) 18

  4. Embellishments More detailed parses (1) agent l More detailed parses cat Labeled arcs det adj » Labeled edges The in obj » Tags (part of speech, word sense, ...) Grammar: hat DFAs must accept strings of word-role pairs » Nonterminals t e d e.g., ( cat, agent ) or ( hat, obj ) the Parser: When we stick two spans together, consider l How to encode probability models covering with: obj obj agent nothing , , , , agent etc. Time penalty: O(m) where m is the number of label types Jason Eisner (U. Penn) 19 Jason Eisner (U. Penn) 20 More detailed parses (2) Nonterminals Optimize tagging & parsing at once C cat 3 a A,B,C c c’ Grammar: B C C B Every input token denotes confusion set A b b’ B The 1 in 6 cat = {cat 1 , cat 2 , cat 3 , cat 4 } c b b’ c’ a one-to-one hat 2 Choice of cat 3 adds a certain weight to parse (cf. Collins 96) Parser: Articulated phrase Flat dependency phrase w/ head a . the 1 projected by head a More possibilities for seeding chart a ... b b ... c Tags of b must match in + The bilexical DFAs for a A,B,C insist a ... b Signature of must specify that its kids come in order A, B, C . tags of a and b Time penalty: Use fast bilexical algorithm, then convert result to nonterminal tree. O(g 4 ) where g is max # tags per input word Want small (and finite) set of tags like a A,B,C . since S goes up by g 2 (Guaranteed by X-bar theory: doff = {doff V,V,VP , doff V,V,VP,S }. ) O(g 3 ) by considering only appropriate b ... c Jason Eisner (U. Penn) 21 Jason Eisner (U. Penn) 22 Using the weights String-local constraints nicely hat x y now Seed chart with word pairs like doff : in We can choose to exclude some such pairs. sink l Deterministic grammar: All weights 0 or - ∞ Example: k-gram tagging. (here k=3) l Generative model: N P Det tag with part-of-speech trigrams log Pr(next kid = nicely | doff in state 2) one cat in the hat weight = log Pr(the | Det)Pr(Det | N,P) l Comprehension model: log Pr(next kid = nicely | doff in state 2, nicely present) Det V P N P Det excluded bigram: in the the 2 words disagree on tag for “cat” Eisner 1996 compared several models, found l significant differences Jason Eisner (U. Penn) 23 Jason Eisner (U. Penn) 24

  5. Conclusions l Bilexical grammar formalism How much do 2 words want to relate? Flexible: encode your favorite representation Flexible: encode your favorite prob. model l Fast parsing algorithm Assemble spans, not constituents O(n 3 ), not O(n 5 ). Precisely, O(n 3 t 2 g 3 m). t=max DFA size, g=max senses/word, m=# label types These grammar factors are typically small Jason Eisner (U. Penn) 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend