Improving Trees and Alignments for Syntax- Based Machine - PowerPoint PPT Presentation

Improving Trees and Alignments for Syntax- Based Machine Translation Kevin Knight USC/Information Sciences Institute joint work with Steven DeNeefe, Daniel Marcu, Wei Wang, and Jonathan May SRI, July 12, 2007

Syntactic Approaches to MT • Use of syntactic information (noun, verb, etc) in the translation process: – Manually constructed rule-based systems – Statistical systems • Wu & Wong, 1998 • Yamada & Knight, 2001-2002 • Galley et al, 2004 – Contrast with phrase-based statistical approaches

Phrase-Based Output . 被枪手警方击毙 Decoder Gunman of police killed . Hypothesis #1

Phrase-Based Output . 被枪手警方击毙 Decoder Gunman of police attack . Hypothesis #7

Phrase-Based Output . 被枪手警方击毙 Decoder Gunman by police killed . Hypothesis #12

Phrase-Based Output . 被枪手警方击毙 Decoder Killed gunman by police . Hypothesis #134

Phrase-Based Output . 被枪手警方击毙 Decoder Gunman killed the police . Hypothesis #9,329

Phrase-Based Output . 被枪手警方击毙 Decoder Gunman killed by police . Hypothesis #50,654 Problematic – - Output lacks English auxiliary and determiner - Re-ordering relies on luck, instead of on Chinese passive marker

Syntax-Based Output . 被枪手警方击毙 Decoder The gunman killed by police . Hypothesis #1 DT NN VBD IN NN NPB PP NP-C VP S

Syntax-Based Output . 被枪手警方击毙 Decoder Gunman by police shot . Hypothesis #16 NN IN NN VBD NPB PP NP-C VP S

Syntax-Based Output . 被枪手警方击毙 Decoder The gunman was killed by police . Hypothesis #1923 DT NN AUX VBN IN NN NPB PP NP-C VP S

Why Might Syntax Help? • Phrase-based MT output is “n-grammatical”, not grammatical – Every sentence needs a subject and a verb • Re-ordering is poorly explained as “distortion” -- better explained as syntactic transformation – Arabic to English, VSO � SVO • Function words have syntactic effects even if they are not themselves translated

Why Might Syntax Hurt? • Less freedom to glue available phrase-based pieces of output translations together -- search space has fewer output strings • Search space is more difficult to navigate • Rule extraction from bilingual text has limitations this talk

Why Might Syntax Hurt? • Less freedom to glue available phrase-based pieces of output translations together -- search space has fewer output strings • Search space is more available syntax-based difficult to navigate translations • Rule extraction from bilingual text has limitations this talk

Why Might Syntax Hurt? • Less freedom to glue available phrase-based pieces of output translations together -- search space has fewer output strings • Search space is more available syntax-based difficult to navigate translations • Rule extraction from bilingual text has limitations

Comparing Phrase-Based Extraction with Syntax-Based Extraction • Quantitatively compare – A typical phrase-based bilingual extraction algorithm ( ATS , Och & Ney 2004) – A typical syntax-based bilingual extraction algorithm ( GHKM , Galley et al 2004) – These algorithms picked from two good- scoring NIST-06 systems • Identify areas of improvement for syntax- based rule coverage

Phrase-Based and Syntax-Based Pattern Extraction estring … alignment cstring ATS [Och & Ney, 2004] phrase pairs consistent with word alignment etree … alignment cstring GHKM [Galley et al 2004] syntax transformation rules consistent with word alignment

ATS (Och & Ney, 2004) PHRASE PAIRS ACQUIRED: � 有 felt � 有责任 felt obliged felt obliged to do � 有责任尽 � 责任 obliged i felt obliged to do my part � 责任尽 obliged to do � 尽 do � 一份 part � 一份力 part 我有责任尽一份力

GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB PRP PRP$ NN VP(VBD(felt) VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 有责任 x0 S(x0:NP-C x1:VP) � x0 x1 我有责任尽一份力

GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB VP(VBD(felt) PRP PRP$ NN VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 责任 x0 � 有 � � 有有责任有责任责任 S(x0:NP-C x1:VP) � x0 x1 我有责任尽一份力 minimal rules tile the tree/string/alignment triple. composed rules are made by combining those tiles.

GHKM Syntax Rules Phrasal Translation Non-constituent Phrases Non-contiguous Phrases hay , NP S poner , NP está , cantando VP VP PRO VP VB NP PRT VBZ VBG there VB NP put on is singing are Context-Sensitive Multilevel Re-Ordering Lexicalized Re-Ordering Word Insertion NP S NP1, , NP2 VB, NP1, NP2 NPB NNS NP2 PP NP1 VP DT NNS P NP1 VB NP2 the of

Improving Trees and Alignments for Syntax- Based Machine - PowerPoint PPT Presentation

Improving Trees and Alignments for Syntax- Based Machine Translation Kevin Knight USC/Information Sciences Institute joint work with Steven DeNeefe, Daniel Marcu, Wei Wang, and Jonathan May SRI, July 12, 2007 Syntactic Approaches to MT

CSCE 471/871 Lecture 2: Alignments Pairwise Alignments Stephen Scott Alignments Scoring

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

Multiple Alignments and Phylogenies Mark Voorhies 3/29/2012 Mark Voorhies Multiple Alignments

Multiple Alignments and Phylogenies Mark Voorhies 3/31/2011 Mark Voorhies Multiple Alignments

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Abstract syntax trees COMP 520 Fall 2010 Abstract syntax trees (2) A compiler pass is a

Chapter 6: Syntax Syntax Syntax is the structure of a language. Earlier, both syntax and

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Global and local alignments Global vs. local alignments Global: align all nucleotides

Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

Syntax Liam OConnor CSE, UNSW (and data61) Term3 2019 1 Abstract Syntax Parsing Bindings

Syntax-Based Decoding Philipp Koehn 9 November 2017 Philipp Koehn Machine Translation:

Syntax-Based Decoding 2 Philipp Koehn 14 November 2017 Philipp Koehn Machine Translation:

Syntax and Semantics Philipp Koehn 3 November 2020 Philipp Koehn Machine Translation: Syntax

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Computational Morphology: Morphological operations Yulia Zinova 09 April 2014 16 July 2014

MU2E FRONT END BOARDS AS A READOUT 1 / DIGITIZATION SOLUTION FOR DUNE JOEL MOUSSEAU, STEN

Lecture 17: Formal Grammars of English Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel

Node-RED, InfmuxDB and Grafana for IoT Sebastian Bttrich, IT University of Copenhagen ICTP T

A model of pattern completion based on the CA3 recurrent synapse Jose Guzman, Michael Frotscher

? 2. The protons and neutrons comprise the vast majority of the mass of an atom and are found

Section 5.1 Models of The Atom Rutherfords atomic model could not explain the chemical

Sambuz

Useful Links

Newsletter

Mail Us