cse 447 547 natural language processing winter 2018
play

CSE 447/547 Natural Language Processing Winter 2018 Parsing - PowerPoint PPT Presentation

CSE 447/547 Natural Language Processing Winter 2018 Parsing (Trees) Yejin Choi - University of Washington [Slides from Dan Klein, Michael Collins, Luke Zettlemoyer and Ray Mooney] Ambiguities I shot [an elephant] [in my pajamas] Examples


  1. CSE 447/547 Natural Language Processing Winter 2018 Parsing (Trees) Yejin Choi - University of Washington [Slides from Dan Klein, Michael Collins, Luke Zettlemoyer and Ray Mooney]

  2. Ambiguities

  3. I shot [an elephant] [in my pajamas] Examples from J&M

  4. Syntactic Ambiguities I § Prepositional phrases: They cooked the beans in the pot on the stove with handles. § Particle vs. preposition: The puppy tore up the staircase. § Complement structures The tourists objected to the guide that they couldn ’ t hear. She knows you like the back of her hand. § Gerund vs. participial adjective Visiting relatives can be boring. Changing schedules frequently confused passengers.

  5. Syntactic Ambiguities II § Modifier scope within NPs impractical design requirements plastic cup holder § Multiple gap constructions The chicken is ready to eat. The contractors are rich enough to sue. § Coordination scope: Small rats and mice can squeeze into holes or cracks in the wall.

  6. Dark Ambiguities § Dark ambiguities : most analyses are shockingly bad (meaning, they don ’ t have an interpretation you can get your mind around) This analysis corresponds to the correct parse of “ This will panic buyers ! ” § Unknown words and new usages § Solution: We need mechanisms to focus attention on the best ones, probabilistic techniques do this

  7. Probabilistic Context Free Grammars

  8. Probabilistic Context-Free Grammars § A context-free grammar is a tuple < N, Σ ,S, R > § N : the set of non-terminals § Phrasal categories: S, NP, VP, ADJP, etc. § Parts-of-speech (pre-terminals): NN, JJ, DT, VB, etc. § Σ : the set of terminals (the words) § S : the start symbol § Often written as ROOT or TOP § Not usually the sentence non-terminal S § R : the set of rules § Of the form X → Y 1 Y 2 … Y n , with X ∈ N, n ≥ 0, Y i ∈ (N ∪ Σ ) § Examples: S → NP VP, VP → VP CC VP § A PCFG adds a distribution q: § Probability q(r) for each r ∈ R, such that for all X ∈ N: � q ( α → β ) = 1 α → β ∈ R : α = X for any .

  9. PCFG Example Vi sleeps 1.0 ⇒ S NP VP 1.0 ⇒ Vt saw 1.0 ⇒ VP Vi 0.4 ⇒ NN man 0.7 ⇒ VP Vt NP 0.4 ⇒ NN woman 0.2 ⇒ VP VP PP 0.2 ⇒ NN telescope 0.1 ⇒ NP DT NN 0.3 ⇒ DT the 1.0 ⇒ NP NP PP 0.7 ⇒ IN with 0.5 ⇒ PP P NP 1.0 ⇒ IN in 0.5 ⇒ • Probability of a tree t with rules α 1 → β 1 , α 2 → β 2 , . . . , α n → β n is n � p ( t ) = q ( α i → β i ) i =1 where q ( α → β ) is the probability for rule α → β . 44

  10. PCFG Example S NP VP 1.0 S ⇒ 1.0 VP Vi 0.4 ⇒ NP VP t 1 = 0.3 0.4 VP Vt NP 0.4 ⇒ DT NN Vi VP VP PP 0.2 ⇒ 1.0 0.7 1.0 The man sleeps NP DT NN 0.3 ⇒ p(t 1 )=1.0*0.3*1.0*0.7*0.4*1.0 NP NP PP 0.7 ⇒ PP P NP 1.0 S ⇒ 1.0 Vi sleeps 1.0 VP Probability of a tree with ru ⇒ 0.2 Vt saw 1.0 t 2 = ⇒ VP PP NN man 0.7 ⇒ 0.4 0.4 NN woman 0.2 NP Vt NP IN NP ⇒ 0.3 0.3 0.3 0.5 1.0 NN telescope 0.1 ⇒ DT NN DT NN DT NN DT the 1.0 ⇒ 1.0 0.2 1.0 0.7 1.0 0.1 The man saw the woman with the telescope IN with 0.5 ⇒ p(t s )=1.8*0.3*1.0*0.7*0.2*0.4*1.0*0.3*1.0*0.2*0.4*0.5*0.3*1.0*0.1 IN in 0.5 ⇒ rules

  11. PCFGs: Learning and Inference § Model The probability of a tree t with n rules α i à β i , i = 1..n § n Y p ( t ) = q ( α i → β i ) i =1 § Learning Read the rules off of labeled sentences, use ML estimates for § probabilities q ML ( α → β ) = Count( α → β ) Count( α ) and use all of our standard smoothing tricks! § § Inference For input sentence s, define T(s) to be the set of trees whole yield is s § (whole leaves, read left to right, match the words in s) t ∗ ( s ) = arg max t ∈ T ( s ) p ( t )

  12. Dynamic Programming § We will store: score of the max parse of x i to x j with root non-terminal X π ( i, j, X ) = § So we can compute the most likely parse: = max t ∈ T G ( s ) p ( t ) π (1 , n, S ) = for all , § Via the recursion: is the s π ( i, j, X ) = max ( q ( X → Y Z ) × π ( i, s, Y ) × π ( s + 1 , j, Z )) X → Y Z ∈ R, s ∈ { i... ( j − 1) } , § With base case: The next section of this note gives justification for this recursive definition. � q ( X → x i ) if X → x i ∈ R π ( i, i, X ) = 0 otherwise natural definition: the only way that we can have a tree ro

  13. The CKY Algorithm § Input: a sentence s = x 1 .. x n and a PCFG = <N, Σ ,S, R, q> § Initialization: For i = 1 … n and all X in N , � q ( X → x i ) if X → x i ∈ R π ( i, i, X ) = 0 otherwise § For l = 1 … (n-1) [iterate all phrase lengths] natural definition: the only way that we can have a tree ro § For i = 1 … (n-l) and j = i+l [iterate all phrases of length l] § For all X in N [iterate all non-terminals] for all , π ( i, j, X ) = max ( q ( X → Y Z ) × π ( i, s, Y ) × π ( s + 1 , j, Z )) X → Y Z ∈ R, s ∈ { i... ( j − 1) } The next section of this note gives justification for this recursive definition. § also, store back pointers bp ( i, j, X ) = arg max ( q ( X → Y Z ) × π ( i, s, Y ) × π ( s + 1 , j, Z )) X → Y Z ∈ R, s ∈ { i... ( j − 1) }

  14. Probabilistic CKY Parser 0.8 S → NP VP 0.1 Book the flight through Houston S → X1 VP 1.0 X1 → Aux NP S :.01, S:.05*.5*.054 S: .03*.0135*.032 S → book | include | prefer Verb:.5 =.00135 =.00001296 0.01 0.004 0.006 S:. 05*.5* Nominal:.03 0.05 S → Verb NP None VP:.5*.5*.054 .000864 None 0.03 S → VP PP =.0000216 =.0135 NP → I | he | she | me NP:.6*.6* 0.1 0.02 0.02 0.06 .0024 NP:.6*.6*.15 NP → Houston | NWA None =.000864 =.054 0.16 .04 Det:.6 Det → the | a | an 0.6 0.1 0.05 Nominal: 0.6 NP → Det Nominal .5*.15*.032 None Nominal → book | flight | meal | money Nominal:.15 =.0024 0.03 0.15 0.06 0.06 0.2 Nominal → Nominal Nominal 0.5 Nominal → Nominal PP PP:1.0*.2*.16 Prep:.2 =.032 Verb → book | include | prefer 0.5 0.04 0.06 0.5 VP → Verb NP 0.3 VP → VP PP Prep → through | to | from NP:.16 0.2 0.3 0.3 1.0 PP → Prep NP

  15. Probabilistic CKY Parser Book the flight through Houston Parse Tree S :.01, S:.05*.5*.054 Verb:.5 =.00135 S:.0000216 #1 Nominal:.03 None VP:.5*.5*.054 None =.0135 Pick most NP:.6*.6* probable .0024 NP:.6*.6*.15 None parse, i.e. take =.000864 =.054 Det:.6 max to Nominal: combine .5*.15*.032 Nominal:.15 None =.0024 probabilities of multiple PP:1.0*.2*.16 derivations Prep:.2 =.032 of each constituent in each cell. NP:.16

  16. Probabilistic CKY Parser Parse Book the flight through Houston Tree S :.01, S:.05*.5*.054 #2 S: 00001296 Verb:.5 =.00135 S:.0000216 Nominal:.03 None VP:.5*.5*.054 None =.0135 Pick most NP:.6*.6* probable .0024 NP:.6*.6*.15 None parse, i.e. take =.000864 =.054 Det:.6 max to Nominal: combine .5*.15*.032 Nominal:.15 None =.0024 probabilities of multiple PP:1.0*.2*.16 derivations Prep:.2 =.032 of each constituent in each cell. NP:.16

  17. Memory § How much memory does this require? § Have to store the score cache § Cache size: |symbols|*n 2 § Pruning: Beam Search § score[X][i][j] can get too large (when?) § Can keep beams (truncated maps score[i][j]) which only store the best K scores for the span [i,j] § Pruning: Coarse-to-Fine § Use a smaller grammar to rule out most X[i,j] § Much more on this later…

  18. Time: Theory § How much time will it take to parse? § For each diff (:= j – i) (<= n) X § For each i (<= n) § For each rule X → Y Z Z Y § For each split point k Do constant work i k j § Total time: |rules|*n 3 § Something like 5 sec for an unoptimized parse of a 20-word sentences

  19. Time: Practice § Parsing with the vanilla treebank grammar: ~ 20K Rules (not an optimized parser!) Observed exponent: 3.6 § Why ’ s it worse in practice? § Longer sentences “ unlock ” more of the grammar § All kinds of systems issues don ’ t scale

  20. Other Dynamic Programs Can also compute other quantities: § Best Inside: score of the max parse X of w i to w j with root non-terminal X 1 n i j § Best Outside: score of the max parse of w 0 to w n with a gap from w i to w j rooted with non-terminal X X § see notes for derivation, it is a bit more complicated 1 n i j § Sum Inside/Outside: Do sums instead of maxes

  21. Why Chomsky Normal Form? Book the flight through Houston S :.01, S:.05*.5*.054 S: .03*.0135*.032 =.00001296 Verb:.5 =.00135 S:. 05*.5* Nominal:.03 None VP:.5*.5*.054 .000864 None =.0000216 =.0135 NP:.6*.6* .0024 NP:.6*.6*.15 Inference: None =.000864 =.054 Det:.6 § Can we keep N-ary (N > 2) rules and Nominal: .5*.15*.032 still do dynamic programming? None Nominal:.15 =.0024 § Can we keep unary rules and still do PP:1.0*.2*.16 Prep:.2 dynamic programming? =.032 Learning: NP:.16 § Can we reconstruct the original trees?

  22. Treebanks

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend