daniel gildea giorgio satta
play

Daniel Gildea Giorgio Satta University of Rochester Universit di - PowerPoint PPT Presentation

Synchronous Context-Free Grammars and Optimal Linear Parsing Strategies Daniel Gildea Giorgio Satta University of Rochester Universit di Padova Synchronous CFG Context-free Grammar: X A B Synchronous Context-free Grammar (SCFG) 4 , C 1


  1. Synchronous Context-Free Grammars and Optimal Linear Parsing Strategies Daniel Gildea Giorgio Satta University of Rochester Università di Padova

  2. Synchronous CFG Context-free Grammar: X → A B Synchronous Context-free Grammar (SCFG) 4 , C 1 B 2 C 3 D 3 A 1 D 4 B 2 X → A C → Powell, 鲍 威 尔

  3. Synchronous CFG • Synchronous parsing: find tree from two strings – used to learn grammar from parallel text • This talk: parsing strategies for long rules • Results also apply to translation with n-gram language model

  4. Context-Free Grammar A → B C C B A

  5. Binary SCFG 2 , C 1 C 2 B 1 A → B C B A

  6. SCFG with 4 nonterminals 4 , C 1 C 2 D 3 E 2 E 4 B 1 D 3 A → B E D C B A

  7. Fan-Out Number of spans in nonterminal. C CFG: fan-out 1 B A SCFG: fan-out 2 E D C B A ϕ ( G ) = max N ∈ G ϕ ( N ) (Rambow & Satta, 1999)

  8. Rank Number of nonterminals on righthand side of rule. C CFG: rank 2 B A SCFG: rank r E D C B A ρ ( G ) = max P ∈ G ρ ( P )

  9. Parsing Strategies Reduce rank E D C B A A → B C D E C D E B X Y X Y A X → B C Y → X D A → Y E

  10. Parsing Strategies Reduce rank, may increase fan-out E D C C B B A X

  11. Rule Length in Synchronous CFG • Binary grammar (ITG): parsing is O ( n 6 ) (Wu, 1997) – Works in real MT (Zhang et al. 2006) • Many rules cannot be binarized without increasing fan-out (Aho and Ullman, 1972) • Fan-out affects space and time complexity

  12. Parsing Complexity Space complexity: O ( n 2 ϕ ( A ) ) Time complexity: O ( n ϕ ( A )+ ϕ ( B )+ ϕ ( C ) ) C C B B A A O ( n 2 ) space O ( n 4 ) space O ( n 3 ) time O ( n 6 ) time (Seki et al. 1991)

  13. SCFG Parsing Strategies E D C C B B A X naïve strategy: O ( n 2 r +2 ) time best strategy: Ω ( n cr ) for some c (Gildea and Štefankovi´ c 2007)

  14. This Talk • Finding optimal space complexity is NP-complete • Finding optimal time complexity ⇒ better algs for treewidth

  15. Example Rule 8 B 7 B 6 B 5 B 4 B 3 B 2 B 1 B A

  16. Optimal Parsing Strategy n 7 n 5 n 6 1 5 B n 3 B n 4 2 6 B n 1 B n 2 3 4 7 8 B B B B 4 B 3 B n 1

  17. Carving Width 1 1 2 3 4 2 3 4 tree layout of G G Carving width: max number edges of G routed through tree layout

  18. Cyclic Permutation Multigraph 1 2 3 4 5 6 7 8 A B B B B B B B B 8 , 1 B 2 B 3 B 4 B 5 B 6 B 7 B A → B 5 B 7 B 3 B 1 B 8 B 6 B 2 B 4 B

  19. Carving Width = Space Complexity A n 7 n 5 n 6 n 3 n 4 5 1 B B n 1 n 2 2 6 B B 7 3 4 8 B B B B

  20. Our Reduction • Carving width instance: ( G , k ) • Construct permutation multigraph G ′ , integer k ′ • Carving width of G ⇔ Carving width of G ′ ⇔ optimal parsing for SCFG

  21. Our Construction 1 1 2 3 4 2 3 4 tree layout of G G X 3 X 1 X 2 X 4 G 3 G 1 G 2 G 4

  22. X 1 X 2 X 3 X 4 G 1 G 2 G 3 G 4

  23. Space Complexity Theorem 1: Finding the parsing strategy with optimal space complexity for an SCFG rule is NP-complete

  24. Treewidth CDE DEF EFG FGH GHI HIJ IJK BCD GHN JKL A C E G I K M ABC HNO KLM B D F H J L N O NOP P Q R OPQ PQR QRS S

  25. Dependency Graph y 0 y 1 y 2 y 3 y 4 x 0 x 1 x 2 x 3 x 4 x 0 x 1 x 2 x 3 x 4 4 , B 1 B 2 C 3 D 2 D 4 A 1 C 3 A → B C D E S → A

  26. Treewidth = Time Complexity x 0 x 3 x 1 x 2 x 4 x 0 x 1 x 2 x 0 x 2 x 3 x 0 x 3 x 4 A → B C D E C D E B X Y X Y A X → B C Y → X D A → Y E

  27. Our Reduction • Treewidth instance: ( G , k ) • Construct dependency graph G ′ , integer k ′ • Approx of treewidth of G ⇔ Treewidth of G ′ ⇔ optimal time complexity for SCFG

  28. Dependency Graph Construction

  29. Approximation Algorithm for Treewidth SOL < 8 ∆ ( G )( OPT + 1) . SOL : solution using SCFG parsing strategy OPT : optimal treewidth of input graph G ∆ ( G ) = degree (max num edges touching one vertex)

  30. Time Complexity Theorem 2: Finding the parsing strategy with optimal time complexity for an SCFG rule implies a ∆ ( G )-factor approximation algorithm for treewidth.

  31. Time Complexity Theorem 3: If finding the parsing strategy with optimal time complexity for an SCFG rule is NP-complete, then treewidth for graphs of degree 6 is NP-complete.

  32. Conclusion • Finding parsing strategy with best space complexity is NP-hard. • P-time alg for finding parsing strategy with best time complexity implies better approximation algs for treewidth • NP-hardness for time complexity implies NP-hardness for treewidth of graphs of degree six

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend