lti
play

lti Introduction Two trends in machine translation research Many - PowerPoint PPT Presentation

Feature-Rich Translation by Quasi-Synchronous Lattice Parsing Kevin Gimpel and Noah A. Smith lti Introduction Two trends in machine translation research Many approaches to decoding Phrase-based Hierarchical phrase-based


  1. Feature-Rich Translation by Quasi-Synchronous Lattice Parsing Kevin Gimpel and Noah A. Smith lti

  2. Introduction � Two trends in machine translation research � Many approaches to decoding � Phrase-based � Hierarchical phrase-based � Tree-to-string � String-to-tree � Tree-to-tree � Regardless of decoding approach, addition of richer features can improve translation quality Decoding algorithms are strongly tied to features permitted lti

  3. Introduction � Two trends in machine translation research � Many approaches to decoding � Phrase-based � Hierarchical phrase-based � Tree-to-string � String-to-tree � Tree-to-tree � Regardless of decoding approach, addition of richer features can improve translation quality � Decoding algorithms are strongly tied to features permitted lti

  4. Phrase-Based Decoding konnten sie es übersetzen ? could you translate it ? lti

  5. Phrase-Based Decoding Phrase Table konnten sie es übersetzen ? 1 konnten → could 2 konnten sie → could you 3 es übersetzen → translate it 4 sie es übersetzen → you translate it could you translate it ? 5 es → it 6 ? → ? ... lti

  6. Phrase-Based Decoding Phrase Table konnten sie es übersetzen ? 1 konnten → could 2 konnten sie → could you 3 es übersetzen → translate it 4 sie es übersetzen → you translate it could you translate it ? 5 es → it 6 ? → ? ... 4 could 6 1 3 could you translate it ? could you translate it 2 5 could you 3 could you it translate it lti

  7. Phrase-Based Decoding Phrase Table konnten sie es übersetzen ? 1 konnten → could 2 konnten sie → could you 3 es übersetzen → translate it 4 sie es übersetzen → you translate it could you translate it ? 5 es → it 6 ? → ? ... 4 could 6 1 3 could you translate it ? could you translate it 2 5 could you 3 could you it Phrase pairs N-gram language model translate it Phrase distortion/reordering Coverage constraints lti

  8. Hierarchical Phrase-Based Decoding SCFG Rules 1 X → es übersetzen / translate it konnten sie es übersetzen ? 0 1 2 3 4 5 2 X → es / it 3 X → übersetzen / translate could you translate it ? 4 X → konnten sie X ? / could you X ? 5 X → konnten sie X 1 X 2 ? / could you X 2 X 1 ? ... lti

  9. Hierarchical Phrase-Based Decoding SCFG Rules 1 X → es übersetzen / translate it konnten sie es übersetzen ? 0 1 2 3 4 5 2 X → es / it 3 X → übersetzen / translate could you translate it ? 4 X → konnten sie X ? / could you X ? 5 X → konnten sie X 1 X 2 ? / could you X 2 X 1 ? ... (2, 4) 1 translate it 4 (2, 3) 2 (0, 5) it could you translate it ? 5 3 (3, 4) translate lti

  10. Hierarchical Phrase-Based Decoding SCFG Rules 1 X → es übersetzen / translate it konnten sie es übersetzen ? 0 1 2 3 4 5 2 X → es / it 3 X → übersetzen / translate could you translate it ? 4 X → konnten sie X ? / could you X ? 5 X → konnten sie X 1 X 2 ? / could you X 2 X 1 ? ... (2, 4) 1 translate it 4 (2, 3) 2 (0, 5) it could you translate it ? 5 3 (3, 4) SCFG rules translate N-gram language model Coverage constraints lti

  11. Our goal: An MT framework that allows as many features as possible without committing to any particular decoding approach lti

  12. Overview � Initial step towards a “universal decoder” that can permit any feature of source and target words/trees/alignments � Experimental platform for comparison of formalisms, feature sets, and training methods � Building blocks: � Quasi-synchronous grammar (Smith & Eisner 2006) � Generic approximate inference methods for non-local features (Chiang 2007; Gimpel & Smith 2009) lti

  13. Outline � Introduction � Model � Quasi-Synchronous Grammar � Training and Decoding � Experiments � Conclusions and Future Work lti

  14. � t ∗ , τ ∗ � , a ∗ � � ������ p � t , τ � , a | s , τ � � � � ,τ � , � � target source alignment of source target words tree target tree words tree nodes to source tree nodes lti

  15. Parameterization � t ∗ , τ ∗ � , a ∗ � � ������ p � t , τ � , a | s , τ � � � � ,τ � , � � � We use a single globally-normalized log-linear model: ��� { θ ⊤ g � s , τ � , a , t , τ � � } p � t , τ � , a | s , τ � � � � � ��� { θ ⊤ g � s , τ � , a ′ , t ′ , τ ′ � � } � ′ , � ′ ,τ ′ � Features can look at any part of any structure lti

  16. Features � Log-linear models allow “arbitrary” features, but in practice inference algorithms must be developed to support feature sets � Many types of features appear in MT: � lexical word and phrase mappings � N -gram and syntactic language models � distortion/reordering � hierarchical phrase mappings � syntactic transfer rules � We want to use all of these! lti

  17. Outline � Introduction � Model � Quasi-Synchronous Grammar � Training and Decoding � Experiments � Conclusions and Future Work lti

  18. Quasi-Synchronous Grammar (Smith & Eisner 06) � A quasi-synchronous grammar (QG) is a model of p � t , τ � , a | s , τ � � To model target trees, any monolingual formalism can be used We use a quasi-synchronous dependency grammar (QDG) Each node in the target tree is aligned to zero or more nodes in the source tree (for a QDG, nodes = words) By placing constraints on the alignments, we obtain synchronous grammars lti

  19. Quasi-Synchronous Grammar (Smith & Eisner 06) � A quasi-synchronous grammar (QG) is a model of p � t , τ � , a | s , τ � � τ � � � To model target trees, any monolingual formalism can be used � We use a quasi-synchronous dependency grammar (QDG) Each node in the target tree is aligned to zero or more nodes in the source tree (for a QDG, nodes = words) By placing constraints on the alignments, we obtain synchronous grammars lti

  20. Quasi-Synchronous Grammar (Smith & Eisner 06) � A quasi-synchronous grammar (QG) is a model of p � t , τ � , a | s , τ � � τ � � � To model target trees, any monolingual formalism can be used � We use a quasi-synchronous dependency grammar (QDG) a � � Each node in the target tree is aligned to zero or more nodes in the source tree (for a QDG, nodes = words) � Constraints on the alignments → synchronous grammar � In QG, departures from synchrony are penalized softly using features lti

  21. $ konnten sie es übersetzen ? $ could you translate it ? lti

  22. For every parent-child pair in the target sentence, what is the relationship of the source words they are linked to? $ konnten sie es übersetzen ? $ could you translate it ? lti

  23. For every parent-child pair in the target sentence, what is the relationship of the source words they are linked to? $ konnten sie es übersetzen ? $ could you translate it ? lti

  24. For every parent-child pair in the target sentence, what is the relationship of the source words they are linked to? $ konnten sie es übersetzen ? Parent-child $ could you translate it ? lti

  25. For every parent-child pair in the target sentence, what is the relationship of the source words they are linked to? All “parent-child ” configurations → synchronous dependency grammar $ konnten sie es übersetzen ? $ could you translate it ? lti

  26. Many other configurations are possible: $ wo kann ich untergrundbahnkarten kaufen ? Same node $ where can i buy subway tickets ? lti

  27. Many other configurations are possible: Parent-child Child-parent Same node Sibling Grandparent/child Grandchild/parent C-Command Parent null Child null Both null Other lti

  28. Coverage Features � There are no hard constraints to ensure that all source words get translated � While QG has been used for several tasks, it has not previously been used for generation � We add coverage features and learn their weights lti

  29. Coverage Feature Weight -2.21 Word never translated Word translated that was translated at least N times already: N = 0 1.48 N = 1 -3.04 N = 2 -0.22 N = 3 -0.05 lti

  30. Coverage Feature Weight -2.21 Word never translated Word translated that was translated at least N times already: N = 0 1.48 N = 1 -3.04 N = 2 -0.22 N = 3 -0.05 2 1 0 -1 -2 Score -3 -4 -5 -6 0 1 2 3 4 5 Number of times word is translated lti

  31. Outline � Introduction � Model � Quasi-Synchronous Grammar � Training and Decoding � Experiments � Conclusions and Future Work lti

  32. Decoding � A QDG induces a monolingual grammar for a source sentence whose language consists of all possible translations � Decoding: � Build a weighted lattice encoding the language of this grammar � Perform lattice parsing with a dependency grammar � Extension of dependency parsing algs for strings (Eisner 97) � Integrate non-local features via cube pruning/decoding (Chiang 07, Gimpel & Smith 09) lti

  33. $ konnten sie es übersetzen ? could you translate it ? lti

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend