improving trees and alignments for syntax based machine
play

Improving Trees and Alignments for Syntax- Based Machine - PowerPoint PPT Presentation

Improving Trees and Alignments for Syntax- Based Machine Translation Kevin Knight USC/Information Sciences Institute joint work with Steven DeNeefe, Daniel Marcu, Wei Wang, and Jonathan May SRI, July 12, 2007 Syntactic Approaches to MT


  1. Improving Trees and Alignments for Syntax- Based Machine Translation Kevin Knight USC/Information Sciences Institute joint work with Steven DeNeefe, Daniel Marcu, Wei Wang, and Jonathan May SRI, July 12, 2007

  2. Syntactic Approaches to MT • Use of syntactic information (noun, verb, etc) in the translation process: – Manually constructed rule-based systems – Statistical systems • Wu & Wong, 1998 • Yamada & Knight, 2001-2002 • Galley et al, 2004 – Contrast with phrase-based statistical approaches

  3. Phrase-Based Output . 被 枪手 警方 击毙 Decoder Gunman of police killed . Hypothesis #1

  4. Phrase-Based Output . 被 枪手 警方 击毙 Decoder Gunman of police attack . Hypothesis #7

  5. Phrase-Based Output . 被 枪手 警方 击毙 Decoder Gunman by police killed . Hypothesis #12

  6. Phrase-Based Output . 被 枪手 警方 击毙 Decoder Killed gunman by police . Hypothesis #134

  7. Phrase-Based Output . 被 枪手 警方 击毙 Decoder Gunman killed the police . Hypothesis #9,329

  8. Phrase-Based Output . 被 枪手 警方 击毙 Decoder Gunman killed by police . Hypothesis #50,654 Problematic – - Output lacks English auxiliary and determiner - Re-ordering relies on luck, instead of on Chinese passive marker

  9. Syntax-Based Output . 被 枪手 警方 击毙 Decoder The gunman killed by police . Hypothesis #1 DT NN VBD IN NN NPB PP NP-C VP S

  10. Syntax-Based Output . 被 枪手 警方 击毙 Decoder Gunman by police shot . Hypothesis #16 NN IN NN VBD NPB PP NP-C VP S

  11. Syntax-Based Output . 被 枪手 警方 击毙 Decoder The gunman was killed by police . Hypothesis #1923 DT NN AUX VBN IN NN NPB PP NP-C VP S

  12. Why Might Syntax Help? • Phrase-based MT output is “n-grammatical”, not grammatical – Every sentence needs a subject and a verb • Re-ordering is poorly explained as “distortion” -- better explained as syntactic transformation – Arabic to English, VSO � SVO • Function words have syntactic effects even if they are not themselves translated

  13. Why Might Syntax Hurt? • Less freedom to glue available phrase-based pieces of output translations together -- search space has fewer output strings • Search space is more difficult to navigate • Rule extraction from bilingual text has limitations this talk

  14. Why Might Syntax Hurt? • Less freedom to glue available phrase-based pieces of output translations together -- search space has fewer output strings • Search space is more available syntax-based difficult to navigate translations • Rule extraction from bilingual text has limitations this talk

  15. Why Might Syntax Hurt? • Less freedom to glue available phrase-based pieces of output translations together -- search space has fewer output strings • Search space is more available syntax-based difficult to navigate translations • Rule extraction from bilingual text has limitations

  16. Comparing Phrase-Based Extraction with Syntax-Based Extraction • Quantitatively compare – A typical phrase-based bilingual extraction algorithm ( ATS , Och & Ney 2004) – A typical syntax-based bilingual extraction algorithm ( GHKM , Galley et al 2004) – These algorithms picked from two good- scoring NIST-06 systems • Identify areas of improvement for syntax- based rule coverage

  17. Phrase-Based and Syntax-Based Pattern Extraction estring … alignment cstring ATS [Och & Ney, 2004] phrase pairs consistent with word alignment etree … alignment cstring GHKM [Galley et al 2004] syntax transformation rules consistent with word alignment

  18. ATS (Och & Ney, 2004) PHRASE PAIRS ACQUIRED: � 有 felt � 有 责任 felt obliged felt obliged to do � 有 责任 尽 � 责任 obliged i felt obliged to do my part � 责任 尽 obliged to do � 尽 do � 一份 part � 一份 力 part 我 有 责任 尽 一份 力

  19. ATS (Och & Ney, 2004) PHRASE PAIRS ACQUIRED: � 有 felt � 有 责任 felt obliged felt obliged to do � 有 责任 尽 � 责任 obliged i felt obliged to do my part � 责任 尽 obliged to do � 尽 do � 一份 part � 一份 力 part 我 有 责任 尽 一份 力

  20. ATS (Och & Ney, 2004) PHRASE PAIRS ACQUIRED: � 有 felt � 有 责任 felt obliged felt obliged to do � 有 责任 尽 � 责任 obliged i felt obliged to do my part � 责任 尽 obliged to do � 尽 do � 一份 part � 一份 力 part 我 有 责任 尽 一份 力

  21. ATS (Och & Ney, 2004) PHRASE PAIRS ACQUIRED: � 有 felt � 有 责任 felt obliged felt obliged to do � 有 责任 尽 � 责任 obliged i felt obliged to do my part � 责任 尽 obliged to do � 尽 do � 一份 part � 一份 力 part 我 有 责任 尽 一份 力

  22. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB PRP PRP$ NN VP(VBD(felt) VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 有 责任 x0 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力

  23. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB PRP PRP$ NN VP(VBD(felt) VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 有 责任 x0 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力

  24. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB PRP PRP$ NN VP(VBD(felt) VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 有 责任 x0 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力

  25. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB PRP PRP$ NN VP(VBD(felt) VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 有 责任 x0 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力

  26. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB PRP PRP$ NN VP(VBD(felt) VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 有 责任 x0 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力

  27. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB PRP PRP$ NN VP(VBD(felt) VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 有 责任 x0 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力

  28. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB VP(VBD(felt) PRP PRP$ NN VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 责任 x0 � 有 � � 有 有 责任 有 责任 责任 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力 minimal rules tile the tree/string/alignment triple. composed rules are made by combining those tiles.

  29. GHKM Syntax Rules Phrasal Translation Non-constituent Phrases Non-contiguous Phrases hay , NP S poner , NP está , cantando VP VP PRO VP VB NP PRT VBZ VBG there VB NP put on is singing are Context-Sensitive Multilevel Re-Ordering Lexicalized Re-Ordering Word Insertion NP S NP1, , NP2 VB, NP1, NP2 NPB NNS NP2 PP NP1 VP DT NNS P NP1 VB NP2 the of

  30. GHKM Syntax Rules Phrasal Translation Non-constituent Phrases Non-contiguous Phrases hay , NP S poner , NP está , cantando VP VP PRO VP VB NP PRT VBZ VBG there VB NP put on is singing are Context-Sensitive Multilevel Re-Ordering Lexicalized Re-Ordering Word Insertion NP S NP1, , NP2 VB, NP1, NP2 NPB NNS NP2 PP NP1 VP DT NNS P NP1 VB NP2 the of

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend