Statistical Machine Translation Outline p Why Syntax? Lecture 5 - - PDF document

statistical machine translation outline p
SMART_READER_LITE
LIVE PREVIEW

Statistical Machine Translation Outline p Why Syntax? Lecture 5 - - PDF document

Reminder: Modeling and Decoding Syntax-Based Statistical Machine Translationp Statistical Machine Translation Outline p Why Syntax? Lecture 5 Yamada and Knight: translating into trees Syntax-Based Models Wu: tree-based transfer


slide-1
SLIDE 1

Statistical Machine Translation Lecture 5 Syntax-Based Models

Philipp Koehn

pkoehn@inf.ed.ac.uk

School of Informatics University of Edinburgh

– p.1

Syntax-Based Statistical Machine Translationp

Outline p

Reminder: Modeling and Decoding Why Syntax? Yamada and Knight: translating into trees Wu: tree-based transfer Chiang: hierarchical transfer Koehn: clause structure Other approaches

Philipp Koehn, University of Edinburgh 2

– p.2

Syntax-Based Statistical Machine Translationp

Phrase-Based Translation Model p

Morgen fliege ich nach Kanada zur Konferenz Tomorrow I will fly to the conference in Canada

Foreign input is segmented in phrases

– any sequence of words, not necessarily linguistically motivated

Each phrase is translated into English Phrases are reordered

Philipp Koehn, University of Edinburgh 3

– p.3

Syntax-Based Statistical Machine Translationp

Decoding p

bruja Maria no dio una bofetada a la Mary did not slap the green verde

Decoding process builds an English translation left to right,

by picking foreign phrases to translate into English phrases

Philipp Koehn, University of Edinburgh 4

– p.4

Syntax-Based Statistical Machine Translationp

Search Space for Decoding Too Big p

Mary not did not give a slap to the witch green by to the to green witch the witch did not give no a slap slap the slap e: Mary f: *-------- p: .534 e: witch f: -------*- p: .182 e: f: --------- p: 1 e: slap f: *-***---- p: .043 e: did not f: **------- p: .154 e: slap f: *****---- p: .015 e: the f: *******-- p: .004283 e:green witch f: ********* p: .000271 no dio a la verde bruja no Maria una bofetada

Explosion of search space ) Pruning, Beam Search

Philipp Koehn, University of Edinburgh 5

– p.5

Syntax-Based Statistical Machine Translationp

Word-Based Translation Model p

Mary did not slap the green witch Mary not slap slap slap the green witch Mary not slap slap slap NULL the green witch Maria no daba una botefada a la verde bruja Maria no daba una bofetada a la bruja verde n(3|slap) p-null t(la|the) d(4|4)

Translation process is broken up into small step:

word translation, reordering, duplication, insertion

Decoding can be done similarly to phrase-based decoding

Philipp Koehn, University of Edinburgh 6

– p.6

slide-2
SLIDE 2

Syntax-Based Statistical Machine Translationp

The Challenge of Syntax p

foreign words foreign syntax foreign semantics interlingua english semantics english syntax english words

The classical machine translation pyramid

Philipp Koehn, University of Edinburgh 7

– p.7

Syntax-Based Statistical Machine Translationp

Advantages of Syntax-Based Translation p

Reordering for syntactic reasons

– e.g., move German object to end of sentence

Better explanation for function words

– e.g., prepositions, determiners

Conditioning to syntactically related words

– translation of verb may depend on subject or object

Use of syntactic language models

Philipp Koehn, University of Edinburgh 8

– p.8

Syntax-Based Statistical Machine Translationp

Syntactic Language Model p

Good syntax tree ! good English Allows for long distance constraints

the man house the

  • f

is small NP NP S VP PP the man house the is is small S NP ? VP VP

Left translation preferred by syntactic LM

Philipp Koehn, University of Edinburgh 9

– p.9

Syntax-Based Statistical Machine Translationp

String to Tree Translation p

foreign words foreign syntax foreign semantics interlingua english semantics english syntax english words

Use of English syntax trees [Yamada and Knight, 2001]

– exploit rich resources on the English side – obtained with statistical parser [Collins, 1997] – flattened tree to allow more reorderings – works well with syntactic language model

Philipp Koehn, University of Edinburgh 10

– p.10

Syntax-Based Statistical Machine Translationp

Yamada and Knight [2001] p

VB VB1 VB2 VB TO TO MN PRP he adores listening to music VB VB1 VB2 VB TO TO MN PRP he adores listening to music VB VB1 VB2 VB TO TO MN PRP he adores listening to music no ha ga desu VB VB1 VB2 VB TO TO MN PRP ha daisuki kiku wo

  • ngaku

no kare ga desu

reorder insert translate take leaves

Kare ha ongaku wo kiku no ga daisuki desu

Philipp Koehn, University of Edinburgh 11

– p.11

Syntax-Based Statistical Machine Translationp

Reordering Table p

Original Order Reordering p(reorder

joriginal)

PRP VB1 VB2 PRP VB1 VB2 0.074 PRP VB1 VB2 PRP VB2 VB1 0.723 PRP VB1 VB2 VB1 PRP VB2 0.061 PRP VB1 VB2 VB1 VB2 PRP 0.037 PRP VB1 VB2 VB2 PRP VB1 0.083 PRP VB1 VB2 VB2 VB1 PRP 0.021 VB TO VB TO 0.107 VB TO TO VB 0.893 TO NN TO NN 0.251 TO NN NN TO 0.749

Philipp Koehn, University of Edinburgh 12

– p.12

slide-3
SLIDE 3

Syntax-Based Statistical Machine Translationp

Decoding as Parsing p

Chart Parsing

kare ha

  • ngaku

wo kiku no ga daisuki desu PRP he

Pick Japanese words Translate into tree stumps

Philipp Koehn, University of Edinburgh 13

– p.13

Syntax-Based Statistical Machine Translationp

Decoding as Parsing p

Chart Parsing

kare ha

  • ngaku

wo kiku no ga daisuki desu PRP he music NN TO to

Pick Japanese words Translate into tree stumps

Philipp Koehn, University of Edinburgh 14

– p.14

Syntax-Based Statistical Machine Translationp

Decoding as Parsing p

kare ha

  • ngaku

wo kiku no ga daisuki desu PRP he music NN TO to PP

Adding some more entries...

Philipp Koehn, University of Edinburgh 15

– p.15

Syntax-Based Statistical Machine Translationp

Decoding as Parsing p

kare ha

  • ngaku

wo kiku no ga daisuki desu PRP he music NN TO to PP VB listening

Combine entries

Philipp Koehn, University of Edinburgh 16

– p.16

Syntax-Based Statistical Machine Translationp

Decoding as Parsing p

kare ha

  • ngaku

wo kiku no ga daisuki desu PRP he music NN TO to PP VB listening VB2 Philipp Koehn, University of Edinburgh 17

– p.17

Syntax-Based Statistical Machine Translationp

Decoding as Parsing p

kare ha

  • ngaku

wo kiku no ga daisuki desu PRP he music NN TO to PP VB listening VB2 VB1 adores Philipp Koehn, University of Edinburgh 18

– p.18

slide-4
SLIDE 4

Syntax-Based Statistical Machine Translationp

Decoding as Parsing p

kare ha

  • ngaku

wo kiku no ga daisuki desu PRP he music NN TO to PP VB listening VB2 VB1 adores VB

Finished when all foreign words covered

Philipp Koehn, University of Edinburgh 19

– p.19

Syntax-Based Statistical Machine Translationp

Yamada and Knight: Training p

Parsing of the English side

– using Collins statistical parser

EM training

– translation model is used to map training sentence pairs – EM training finds low-perplexity model

! unity of training and decoding as in IBM models

Philipp Koehn, University of Edinburgh 20

– p.20

Syntax-Based Statistical Machine Translationp

Is the Model Realistic? p

Do English trees match foreign strings? Crossings between French-English [Fox, 2002]

– 0.29-6.27 per sentence, depending on how it is measured

Can be reduced by

– flattening tree, as done by [Yamada and Knight, 2001] – detecting phrasal translation – special treatment for small number of constructions

Most coherence between dependency structures

Philipp Koehn, University of Edinburgh 21

– p.21

Syntax-Based Statistical Machine Translationp

Inversion Transduction Grammars p

Generation of both English and foreign trees [Wu, 1997] Rules (binary and unary)

A ! A 1 A 2 kA 1 A 2

A ! A 1 A 2 kA 2 A 1

A ! ekf

A ! ek

A ! kf ) Common binary tree required

– limits the complexity of reorderings

Philipp Koehn, University of Edinburgh 22

– p.22

Syntax-Based Statistical Machine Translationp

Syntax Trees p

Mary did not slap the green witch

English binary tree

Philipp Koehn, University of Edinburgh 23

– p.23

Syntax-Based Statistical Machine Translationp

Syntax Trees (2) p

Maria no daba una bofetada a la bruja verde

Spanish binary tree

Philipp Koehn, University of Edinburgh 24

– p.24

slide-5
SLIDE 5

Syntax-Based Statistical Machine Translationp

Syntax Trees (3) p

Mary Maria did * not no slap daba * una * bofetada * a the la green verde witch bruja

Combined tree with reordering of Spanish

Philipp Koehn, University of Edinburgh 25

– p.25

Syntax-Based Statistical Machine Translationp

Inversion Transduction Grammars p

Decoding by parsing (as before) Variations

– may use real syntax on either side or both – may use multi-word units at leaf nodes

Reordering constraints of ITG used in phrase-based

systems

Philipp Koehn, University of Edinburgh 26

– p.26

Syntax-Based Statistical Machine Translationp

Chiang: Hierarchical Phrase Model p

Chiang [ACL, 2005] (best paper award!)

– context free bi-grammar – one non-terminal symbol – right hand side of rule may include non-terminals and terminals

Competitive with phrase-based models in 2005

DARPA/NIST evaluation

Philipp Koehn, University of Edinburgh 27

– p.27

Syntax-Based Statistical Machine Translationp

Types of Rules p

Word translation

– X

! maison k house Phrasal translation

– X

! daba una bofetada j slap Mixed non-terminal / terminal

– X

! X bleue k blue X

– X

! ne X pas k not X

– X

! X1 X2 k X2 of X1 Technical rules

– S

! S X k S X

– S

! X k X

Philipp Koehn, University of Edinburgh 28

– p.28

Syntax-Based Statistical Machine Translationp

Learning Hierarchical Rules p

Maria no daba una botefada a la bruja verde Mary witch green the slap not did

X

! X verde k green X

Philipp Koehn, University of Edinburgh 29

– p.29

Syntax-Based Statistical Machine Translationp

Learning Hierarchical Rules p

Maria no daba una botefada a la bruja verde Mary witch green the slap not did

X

! a la X k the X

Philipp Koehn, University of Edinburgh 30

– p.30

slide-6
SLIDE 6

Syntax-Based Statistical Machine Translationp

Details p

Too many rules ! filtering of rules necessary Efficient parse decoding possible

– hypothesis stack for each span of foreign words – only one non-terminal

! hypotheses comparable

– length limit for spans that do not start at beginning

Philipp Koehn, University of Edinburgh 31

– p.31

Syntax-Based Statistical Machine Translationp

Syntax-Aided Phrase-Based MT [Koehn] p

Approach:

– stick with phrase-based system – special treatment for special syntactic problems

Noun Phrase Translation Clause Level Restructuring

Philipp Koehn, University of Edinburgh 32

– p.32

Syntax-Based Statistical Machine Translationp

Clause Level Restructuring p

Why clause structure?

– languages differ vastly in their clause structure (English: SVO, Arabic: VSO, German: fairly free order; a lot details differ: position of adverbs, sub clauses, etc.) – large-scale restructuring is a problem for phrase models

Restructuring

– reordering of constituents (main focus) – add/drop/change of function words

ACL 2005 paper [Collins, Koehn, Kucerova]

Philipp Koehn, University of Edinburgh 33

– p.33

Syntax-Based Statistical Machine Translationp

Clause Structure p

S PPER-SB Ich VAFIN-HD werde VP-OC PPER-DA Ihnen NP-OA ART-OA die ADJ-NK entsprechenden NN-NK Anmerkungen VVFIN aushaendigen $, , S-MO KOUS-CP damit PPER-SB Sie VP-OC PDS-OA das ADJD-MO eventuell PP-MO APRD-MO bei ART-DA der NN-NK Abstimmung VVINF uebernehmen VMFIN koennen $. . I will you the corresponding comments pass on , so that you that perhaps in the vote include can .

MAIN CLAUSE SUB- ORDINATE CLAUSE

Syntax tree from German parser

– statistical parser by Amit Dubay, trained on TIGER treebank

Philipp Koehn, University of Edinburgh 34

– p.34

Syntax-Based Statistical Machine Translationp

Reordering When Translating p

S PPER-SB Ich VAFIN-HD werde PPER-DA Ihnen NP-OA ART-OA die ADJ-NK entsprechenden NN-NK Anmerkungen VVFIN aushaendigen $, , S-MO KOUS-CP damit PPER-SB Sie PDS-OA das ADJD-MO eventuell PP-MO APRD-MO bei ART-DA der NN-NK Abstimmung VVINF uebernehmen VMFIN koennen $. . I will you the corresponding comments pass on , so that you that perhaps in the vote include can .

Reordering when translating into English

– tree is flattened – clause level constituents line up

Philipp Koehn, University of Edinburgh 35

– p.35

Syntax-Based Statistical Machine Translationp

Clause Level Reordering p

S PPER-SB Ich VAFIN-HD werde PPER-DA Ihnen NP-OA ART-OA die ADJ-NK entsprechenden NN-NK Anmerkungen VVFIN aushaendigen $, , S-MO KOUS-CP damit PPER-SB Sie PDS-OA das ADJD-MO eventuell PP-MO APRD-MO bei ART-DA der NN-NK Abstimmung VVINF uebernehmen VMFIN koennen $. . I will you the corresponding comments pass on , so that you that perhaps in the vote include can . 1 2 4 5 3 1 2 6 4 7 5 3

Clause level reordering is a well defined task

– label German constituents with their English order – done this for 300 sentences, two annotators, high agreement

Philipp Koehn, University of Edinburgh 36

– p.36

slide-7
SLIDE 7

Syntax-Based Statistical Machine Translationp

Systematic Reordering German

! English p Many types of reorderings are systematic

– move verb group together – subject - verb - object – move negation in front of verb

) Write rules by hand

– apply rules to test and training data – train standard phrase-based SMT system System BLEU baseline system 25.2% with manual rules 26.8%

Philipp Koehn, University of Edinburgh 37

– p.37

Syntax-Based Statistical Machine Translationp

Improved Translations p

we must also this criticism should be taken seriously . ! we must also take this criticism seriously . i am with him that it is necessary , the institutional balance by means of a

political revaluation of both the commission and the council to maintain .

! i agree with him in this , that it is necessary to maintain the institutional

balance by means of a political revaluation of both the commission and the council .

thirdly , we believe that the principle of differentiation of negotiations note . ! thirdly , we maintain the principle of differentiation of negotiations . perhaps it would be a constructive dialog between the government and
  • pposition parties , social representative a positive impetus in the right

direction .

! perhaps a constructive dialog between government and opposition parties

and social representative could give a positive impetus in the right direction .

Philipp Koehn, University of Edinburgh 38

– p.38

Syntax-Based Statistical Machine Translationp

Other Syntax-Based Approaches p

ISI: extending work of Yamada/Knight

– more complex rules – performance approaching phrase-based

Prague: Translation via dependency structures

– parallel Czech–English dependency treebank – tecto-grammatical translation model [EACL 2003]

U.Alberta/Microsoft: treelet translation

– translating from English into foreign languages – using dependency parser in English – project dependency tree into foreign language for training – map parts of the dependency tree (“treelets”) into foreign languages

Philipp Koehn, University of Edinburgh 39

– p.39

Syntax-Based Statistical Machine Translationp

Other Syntax-Based Approaches (2) p

Reranking phrase-based SMT output with syntactic

features

– create n-best list with phrase-based system – POS tag and parse candidate translations – rerank with syntactic features – see [Koehn, 2003] and JHU Workshop [Och et al., 2003]

JHU Summer workshop 2005

– final presentations this week – tools for syntax-based SMT

Philipp Koehn, University of Edinburgh 40

– p.40

Syntax-Based Statistical Machine Translationp

Syntax: Does it help? p

Not yet

– best systems still phrase-based, treat words as tokens

Well, maybe...

– work on reordering German – automatically trained tree transfer systems promising

Why not yet?

– if real syntax, we need good parsers — are they good enough? – syntactic annotations add a level of complexity

! difficult to handle, slow to train and decode

– few researchers good at statistical modeling and understand syntactic theories

Philipp Koehn, University of Edinburgh 41

– p.41