transforming projective bilexical dependency grammars
play

Transforming Projective Bilexical Dependency Grammars into - PowerPoint PPT Presentation

Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold Mark Johnson Microsoft Research Brown University ACL 2007 1 / 22 Motivation and summary Whats the relationship between CKY parsing


  1. Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold Mark Johnson Microsoft Research Brown University ACL 2007 1 / 22

  2. Motivation and summary ◮ What’s the relationship between CKY parsing and the Eisner/Satta O ( n 3 ) PBDG parsing algorithm? (c.f., McAllester 1999) ◮ split-head encoding , collecting left and right dependents separately ◮ unfold-fold transform reorganizes grammar for efficient CKY parsing ◮ Approach generalizes to 2nd-order dependencies ◮ predict argument given governor and sibling (McDonald 2006) ◮ predict argument given governor and governor’s governor ◮ In principle can use any CFG parsing or estimation algorithm for PBDGs ◮ transformed grammars typically too large to enumerate ◮ my CKY implementations transform grammar on the fly 2 / 22

  3. Outline Projective Bilexical Dependency Grammars Simple split-head encoding O ( n 3 ) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion 3 / 22

  4. Projective Bilexical Dependency Grammars ◮ Projective Bilexical Dependency Grammar (PBDG) 0 gave Sandy gave gave dog the dog gave bone a bone ◮ A dependency parse generated by the PBDG 0 Sandy gave the dog a bone ◮ Weights can be attached to dependencies (and preserved in CFG transforms) 4 / 22

  5. A naive encoding of PBDGs as CFGs S → X u where 0 u X u → u X u → X v X u where v u X u → X u X v where u v S X gave X Sandy X gave Sandy X gave X bone X gave X dog X a X bone gave X the X dog a bone the dog 5 / 22

  6. Spurious ambiguity in naive encoding ◮ Naive encoding allows dependencies on different sides of head to be freely reordered ⇒ Spurious ambiguity in CFG parses (not present in PBDG parses) S X gave S X Sandy X gave X gave Sandy X gave X bone X gave X bone X gave X dog X a X bone X gave X dog X a X bone gave X the X dog a bone X Sandy X gave X the X dog a bone the dog Sandy gave the dog 6 / 22

  7. Parsing naive CFG encoding takes O ( n 5 ) time ◮ A production schema such as X u X u X v → has 5 variables, and so can match input in O ( n 5 ) different ways X u X u X v i u j v k 7 / 22

  8. Outline Projective Bilexical Dependency Grammars Simple split-head encoding O ( n 3 ) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion 8 / 22

  9. Simple split-head encoding ◮ Replace input word u with a left variant u ℓ and a right variant u r (can be avoided in practice with fancy book-keeping) Sandy gave the dog a bone ⇓ Sandy ℓ Sandy r gave ℓ gave r the ℓ the r dog ℓ dog r a ℓ a r bone ℓ bone r ◮ PCFG separately collects left dependencies and right dependencies S S X u where 0 → u X gave X u L u u R where u ∈ Σ → L gave gave R L u → u l X Sandy L gave gave R X bone L u X v L u where v u → u R u r → Sandy gave R X dog a bone u R u R X v where u v → gave ℓ gave r the dog 9 / 22

  10. Simple split-head CFG parse S X gave L gave gave R X Sandy L gave gave R X bone L Sandy Sandy R gave R X dog L bone bone R Sandy ℓ Sandy r gave ℓ gave r L dog dog R X a L bone X the L dog L a a R bone ℓ bone r L the the R dog ℓ dog r a ℓ a r the ℓ the r 10 / 22

  11. L u and u R heads are phrase-peripheral ⇒ O ( n 4 ) ◮ Heads of L u and u R are always at right (left) edge S → X u where 0 u X u X u → L u u R where u ∈ Σ L u u R L u → u l X v 1 X v 3 L u → X v L u where v u L u u R u R → u r u R → u R X v where u v X v 2 L u u R X v 4 u ℓ u r u R ◮ X u take O ( n 3 ) → L u u R u R X v take O ( n 4 ) ◮ u R → u R X v i = u j v k 11 / 22

  12. Outline Projective Bilexical Dependency Grammars Simple split-head encoding O ( n 3 ) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion 12 / 22

  13. The Unfold-Fold transform ◮ Unfold-fold originally proposed for transforming recursive programs; used here to transform CFGs into new CFGs ◮ Unfolding a nonterminal replaces it with its expansion A → α β 1 γ A → α B γ A → α β 2 γ B → β 1 B → β 1 ⇒ B → β 2 B → β 2 . . . . . . ◮ Folding is the inverse of unfolding (replace RHS with nonterminal) A → α β γ A → α B γ B → β B → β ⇒ . . . . . . ◮ Transformed grammar generates same language (Sato 1992) 13 / 22

  14. Unfold-fold converts O ( n 4 ) to O ( n 3 ) grammar ◮ Unfold X v responsible for O ( n 4 ) parse time L u → u l L u u l → L u X v L u → ⇒ L u L v v R L u → X v L v v R → ◮ Introduce new non-terminals x M y (doesn’t change language) x M y → x R L y ◮ Fold two children of L u into x M y L u → u l L u → u l L u L v v R L u L u L v v M u → → ⇒ x M y x R L y x M y x R L y → → 14 / 22

  15. Transformed grammar collects left and right dependencies separately L u u R L u u R ⇒ X v X v ′ v M u u M v ′ L v v R L u u R L v ′ v ′ R L v v R L u u R L v ′ v ′ R u ℓ u r u ℓ u r ◮ X v constituents (which cause O ( n 4 ) parse time) no longer used ◮ Head annotations now all phrase peripheral ⇒ O ( n 3 ) parse time ◮ Dependencies can be recovered from parse tree ◮ Basically same as Eisner and Satta O ( n 3 ) algorithm ◮ explains why Inside-Outside sanity check fails for Eisner/Satta ◮ two copies of each terminal ⇒ each terminals’ Outside probability is double the Inside sentence probability 15 / 22

  16. Parse using O ( n 3 ) transformed split-head grammar S L gave gave R L Sandy Sandy M gave gave M bone bone R Sandy R L gave gave R L bone Sandy ℓ Sandy r gave M dog dog R L a a M bone gave R L dog a R L bone gave ℓ gave r L the the M dog a ℓ a r bone ℓ bone r the R L dog the ℓ the r dog r dog ℓ 0 Sandy gave the dog a bone 16 / 22

  17. Parsing time of CFG encodings of same PBDG CFG schemata sentences parsed / second Naive O ( n 5 ) CFG 45.4 O ( n 4 ) simple split-head CFG 406.2 O ( n 3 ) transformed split-head CFG 3580.0 ◮ Weighted PBDG; all pairs of heads have some dependency weight ◮ Dependency weights precomputed before parsing begins ◮ Timing results on a 3.6GHz Pentium 4 machine parsing section 24 of the PTB ◮ CKY parsers with grammars hard-coded in C (no rule lookup) ◮ Dependency accuracy of Viterbi parses = 0.8918 for all grammars ◮ Feature extraction is much slower than even naive CFG 17 / 22

  18. Outline Projective Bilexical Dependency Grammars Simple split-head encoding O ( n 3 ) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion 18 / 22

  19. Predict argument based on governor and sibling S L gave gave R R gave M bone R bone L R L Sandy Sandy M gave M dog M bone gave dog Sandy R gave ℓ gave r L dog dog R L bone L L Sandy ℓ Sandy r L the the M L a a M dog bone the R dog ℓ dog r a R bone ℓ bone r the ℓ the r a ℓ a r ◮ Very similar to second-order algorithm given by McDonald (2006) 19 / 22

  20. Predict argument based on governor and governor’s governor S L gave gave R L R L Sandy Sandy M gave M bone R gave bone L Sandy ℓ Sandy r L gave gave M a a M bone gave R L a L bone R gave M dog R a ℓ a r bone ℓ bone r dog L gave M the the M dog gave R L the L dog gave ℓ gave r the ℓ the r dog ℓ dog r ◮ Because left and right dependencies are assembled separately, only captures 2nd-order dependencies where one dependency is leftward and other is rightward 20 / 22

  21. Outline Projective Bilexical Dependency Grammars Simple split-head encoding O ( n 3 ) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion 21 / 22

  22. Conclusion and future work ◮ Presented a reduction from PBDGs to O ( n 3 ) parsable CFGs ◮ split-head CFG representation of PBDGs ◮ Unfold-fold transform ◮ CKY algorithm on resulting CFG simulates Eisner/Satta algorithm on original PBDG ◮ Makes CFG techniques applicable to PBDGs ◮ max marginal parsing (Goodman 1996) and other CFG parsing and estimation algorithms ◮ Can capture different dependencies, yielding different PDG models ◮ 2nd-order “horizontal” dependencies (McDonald 2006) ◮ what other combinations of dependencies can we capture? (if we permit O ( n 4 ) parse time?) ◮ do any of these improve parsing accuracy? 22 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend