Winter School Day 3: Decoding / Phrase-based models MT Marathon 28 - PowerPoint PPT Presentation

Winter School Day 3: Decoding / Phrase-based models MT Marathon 28 January 2009 MT Marathon Spring School, Lecture 3 28 January 2009

1 Statistical Machine Translation • Components: Translation model, language model, decoder foreign/English English parallel text text statistical analysis statistical analysis Translation Language Model Model Decoding Algorithm MT Marathon Spring School, Lecture 3 28 January 2009

2 Phrase-Based Translation Morgen fliege ich nach Kanada zur Konferenz Tomorrow I will fly to the conference in Canada • Foreign input is segmented in phrases – any sequence of words, not necessarily linguistically motivated • Each phrase is translated into English • Phrases are reordered MT Marathon Spring School, Lecture 3 28 January 2009

3 Phrase Translation Table • Phrase Translations for “den Vorschlag”: English φ (e | f) English φ (e | f) the proposal 0.6227 the suggestions 0.0114 ’s proposal 0.1068 the proposed 0.0114 a proposal 0.0341 the motion 0.0091 the idea 0.0250 the idea of 0.0091 this proposal 0.0227 the proposal , 0.0068 proposal 0.0205 its proposal 0.0068 of the proposal 0.0159 it 0.0068 the proposals 0.0159 ... ... MT Marathon Spring School, Lecture 3 28 January 2009

4 Decoding Process Maria no dio una bofetada a la bruja verde • Build translation left to right – select foreign words to be translated MT Marathon Spring School, Lecture 3 28 January 2009

5 Decoding Process Maria no dio una bofetada a la bruja verde Mary • Build translation left to right – select foreign words to be translated – find English phrase translation – add English phrase to end of partial translation MT Marathon Spring School, Lecture 3 28 January 2009

6 Decoding Process Maria no dio una bofetada a la bruja verde Mary • Build translation left to right – select foreign words to be translated – find English phrase translation – add English phrase to end of partial translation – mark foreign words as translated MT Marathon Spring School, Lecture 3 28 January 2009

7 Decoding Process Maria no dio una bofetada a la bruja verde Mary did not • One to many translation MT Marathon Spring School, Lecture 3 28 January 2009

8 Decoding Process Maria no dio una bofetada a la bruja verde Mary did not slap • Many to one translation MT Marathon Spring School, Lecture 3 28 January 2009

9 Decoding Process Maria no dio una bofetada a la bruja verde Mary did not slap the • Many to one translation MT Marathon Spring School, Lecture 3 28 January 2009

10 Decoding Process Maria no dio una bofetada a la bruja verde Mary did not slap the green • Reordering MT Marathon Spring School, Lecture 3 28 January 2009

11 Decoding Process Maria no dio una bofetada a la bruja verde Mary did not slap the green witch • Translation finished MT Marathon Spring School, Lecture 3 28 January 2009

12 Translation Options Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch • Look up possible phrase translations – many different ways to segment words into phrases – many different ways to translate each phrase MT Marathon Spring School, Lecture 3 28 January 2009

13 Hypothesis Expansion Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: f: --------- p: 1 • Start with empty hypothesis – e: no English words – f: no foreign words covered – p: probability 1 MT Marathon Spring School, Lecture 3 28 January 2009

14 Hypothesis Expansion Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: e: Mary f: --------- f: *-------- p: 1 p: .534 • Pick translation option • Create hypothesis – e: add English phrase Mary – f: first foreign word covered – p: probability 0.534 MT Marathon Spring School, Lecture 3 28 January 2009

15 A Quick Word on Probabilities • Not going into detail here, but... • Translation Model – phrase translation probability p(Mary | Maria) – reordering costs – phrase/word count costs – ... • Language Model – uses trigrams: – p (Mary did not) = p (Mary | START) × p (did | Mary,START) × p(not | Mary did) MT Marathon Spring School, Lecture 3 28 January 2009

16 Hypothesis Expansion Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch f: -------*- p: .182 e: e: Mary f: --------- f: *-------- p: 1 p: .534 • Add another hypothesis MT Marathon Spring School, Lecture 3 28 January 2009

17 Hypothesis Expansion Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch e: ... slap f: -------*- f: *-***---- p: .182 p: .043 e: e: Mary f: --------- f: *-------- p: 1 p: .534 • Further hypothesis expansion MT Marathon Spring School, Lecture 3 28 January 2009

18 Hypothesis Expansion Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch e: slap f: -------*- f: *-***---- p: .182 p: .043 e: e: Mary e: did not e: slap e: the e:green witch f: --------- f: *-------- f: **------- f: *****---- f: *******-- f: ********* p: 1 p: .534 p: .154 p: .015 p: .004283 p: .000271 • ... until all foreign words covered – find best hypothesis that covers all foreign words – backtrack to read off translation MT Marathon Spring School, Lecture 3 28 January 2009

19 Hypothesis Expansion Maria no no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch e: slap f: -------*- f: *-***---- p: .182 p: .043 e: e: Mary e: did not e: slap e: the e:green witch f: --------- f: *-------- f: **------- f: *****---- f: *******-- f: ********* p: 1 p: .534 p: .154 p: .015 p: .004283 p: .000271 • Adding more hypothesis ⇒ Explosion of search space MT Marathon Spring School, Lecture 3 28 January 2009

20 Explosion of Search Space • Number of hypotheses is exponential with respect to sentence length ⇒ Decoding is NP-complete [Knight, 1999] ⇒ Need to reduce search space – risk free: hypothesis recombination – risky: histogram/threshold pruning MT Marathon Spring School, Lecture 3 28 January 2009

21 Hypothesis Recombination p=0.092 p=1 p=0.534 p=0.092 Mary did not give did not give p=0.164 p=0.044 • Different paths to the same partial translation MT Marathon Spring School, Lecture 3 28 January 2009

22 Hypothesis Recombination p=0.092 p=1 p=0.534 p=0.092 Mary did not give did not give p=0.164 • Different paths to the same partial translation ⇒ Combine paths – drop weaker path – keep pointer from weaker path (for lattice generation) MT Marathon Spring School, Lecture 3 28 January 2009

23 Hypothesis Recombination p=0.092 p=0.017 did not give Joe p=1 p=0.534 p=0.092 Mary did not give did not give p=0.164 • Recombined hypotheses do not have to match completely • No matter what is added, weaker path can be dropped, if: – last two English words match (matters for language model) – foreign word coverage vectors match (effects future path) MT Marathon Spring School, Lecture 3 28 January 2009

24 Hypothesis Recombination p=0.092 did not give Joe p=1 p=0.534 p=0.092 Mary did not give did not give p=0.164 • Recombined hypotheses do not have to match completely • No matter what is added, weaker path can be dropped, if: – last two English words match (matters for language model) – foreign word coverage vectors match (effects future path) ⇒ Combine paths MT Marathon Spring School, Lecture 3 28 January 2009

25 Pruning • Hypothesis recombination is not sufficient ⇒ Heuristically discard weak hypotheses early • Organize Hypothesis in stacks , e.g. by – same foreign words covered – same number of foreign words covered • Compare hypotheses in stacks, discard bad ones – histogram pruning : keep top n hypotheses in each stack (e.g., n =100) – threshold pruning : keep hypotheses that are at most α times the cost of best hypothesis in stack (e.g., α = 0.001) MT Marathon Spring School, Lecture 3 28 January 2009

26 Hypothesis Stacks 1 2 3 4 5 6 • Organization of hypothesis into stacks – here: based on number of foreign words translated – during translation all hypotheses from one stack are expanded – expanded Hypotheses are placed into stacks MT Marathon Spring School, Lecture 3 28 January 2009

Winter School Day 3: Decoding / Phrase-based models MT Marathon 28 - PowerPoint PPT Presentation

Winter School Day 3: Decoding / Phrase-based models MT Marathon 28 January 2009 MT Marathon Spring School, Lecture 3 28 January 2009 1 Statistical Machine Translation Components: Translation model, language model, decoder

The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Winter Driving Safety PPT-SM-WNTRDRVNG 1 V.A.0.0 Winter Driving The leading cause of death

NW Cook County Group Winter Outings NW Cook County Group Winter Outings Agenda Review of

Roads and Transportation Service WINTER SERVICE REVIEW WINTER SERVICE REVIEW PREPARATION FOR THE

Get ready for winter November 2013 Reduce your risk this winter This pack includes advice to

Market Simulation Winter 2017 Release Trang Deluca Sr. Change & Release Planner Winter

Winter Outlook Heating Season 2014-2015 1 Winter Outlook: Outline Review: How did we do

Your Adventure Starts Here 2018 Marketing Plan Year in Review Winter 2018 Winter 2018 Winter

Market Simulation Winter 2017 Release Trang Deluca Sr. Change & Release Planner Winter

Winter Preparedness Winter Preparedness 2017/2018 Aisling Brophy, Resilience Advisor Louise

VARIABILITY OF HAWAIIAN WINTER RAINFALL VARIABILITY OF HAWAIIAN WINTER RAINFALL VARIABILITY OF

Winter Plan Prepared by Systemwide Winter Planning Group Approved by Integrated System Delivery

Market Simulation Winter 2017 Release Trang Deluca Sr. Change & Release Planner Winter

Dealing with Winter Neighbourhood Operations 1 Dealing with Winter Background to

Market Simulation Winter 2017 Release Trang Deluca Sr. Change & Release Planner Winter

Announcement Career Fair Strings and Languages Wednesday, September 27th 10am -- 4pm

Relative Clause clause adds additional information to the noun in the sentence. 1 Direct

Presented by WAN, Pengfei Dept. ECE, HKUST Wei Chen, et al, Efficient Influence Maximization in

Some Preliminary Market Research: A Googoloscopy Parametric Links for Binary Response Link

Strings I Strings are built from characters The string "Computer" is represented

RAID 2001 Symposium Concluding Remarks October 12, Davis, CA Ludovic M / Wenke Lee / Felix Wu

Statistical Machine Translation May 13th, 2014 Josef van Genabith DFKI GmbH

CSE 158 Lecture 3 Web Mining and Recommender Systems Classification Learning outcomes This

Winter School Day 3: Decoding / Phrase-based models MT Marathon 28 - PowerPoint PPT Presentation

Winter School Day 3: Decoding / Phrase-based models MT Marathon 28 January 2009 MT Marathon Spring School, Lecture 3 28 January 2009 1 Statistical Machine Translation Components: Translation model, language model, decoder

The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Winter Driving Safety PPT-SM-WNTRDRVNG 1 V.A.0.0 Winter Driving The leading cause of death

NW Cook County Group Winter Outings NW Cook County Group Winter Outings Agenda Review of

Roads and Transportation Service WINTER SERVICE REVIEW WINTER SERVICE REVIEW PREPARATION FOR THE

Get ready for winter November 2013 Reduce your risk this winter This pack includes advice to

Market Simulation Winter 2017 Release Trang Deluca Sr. Change &amp; Release Planner Winter

Winter Outlook Heating Season 2014-2015 1 Winter Outlook: Outline Review: How did we do

Your Adventure Starts Here 2018 Marketing Plan Year in Review Winter 2018 Winter 2018 Winter

Market Simulation Winter 2017 Release Trang Deluca Sr. Change &amp; Release Planner Winter

Winter Preparedness Winter Preparedness 2017/2018 Aisling Brophy, Resilience Advisor Louise

VARIABILITY OF HAWAIIAN WINTER RAINFALL VARIABILITY OF HAWAIIAN WINTER RAINFALL VARIABILITY OF

Winter Plan Prepared by Systemwide Winter Planning Group Approved by Integrated System Delivery

Market Simulation Winter 2017 Release Trang Deluca Sr. Change &amp; Release Planner Winter

Dealing with Winter Neighbourhood Operations 1 Dealing with Winter Background to

Market Simulation Winter 2017 Release Trang Deluca Sr. Change &amp; Release Planner Winter

Announcement Career Fair Strings and Languages Wednesday, September 27th 10am -- 4pm

Relative Clause clause adds additional information to the noun in the sentence. 1 Direct

Presented by WAN, Pengfei Dept. ECE, HKUST Wei Chen, et al, Efficient Influence Maximization in

Some Preliminary Market Research: A Googoloscopy Parametric Links for Binary Response Link

Strings I Strings are built from characters The string &quot;Computer&quot; is represented

RAID 2001 Symposium Concluding Remarks October 12, Davis, CA Ludovic M / Wenke Lee / Felix Wu

Statistical Machine Translation May 13th, 2014 Josef van Genabith DFKI GmbH

CSE 158 Lecture 3 Web Mining and Recommender Systems Classification Learning outcomes This

Market Simulation Winter 2017 Release Trang Deluca Sr. Change & Release Planner Winter

Market Simulation Winter 2017 Release Trang Deluca Sr. Change & Release Planner Winter

Market Simulation Winter 2017 Release Trang Deluca Sr. Change & Release Planner Winter

Market Simulation Winter 2017 Release Trang Deluca Sr. Change & Release Planner Winter

Strings I Strings are built from characters The string "Computer" is represented