pharaoh a beam search decoder for phrase based
play

Pharaoh: a Beam Search Decoder for Phrase-Based Statistical - PowerPoint PPT Presentation

Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation Philipp Koehn koehn@csail.mit.edu Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology p.1 Pharaoh: a


  1. Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation Philipp Koehn koehn@csail.mit.edu Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology – p.1

  2. � � � � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Outline p Phrase-Based Statistical MT Beam Search Decoding Experiments Advanced Features – p.2 Philipp Koehn, Massachusetts Institute of Technology 2

  3. � � � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Machine Translation p Task: Make sense of foreign text like Long-standing problem in artificial intelligence Ultimately requires syntax, semantics, pragmatics – p.3 Philipp Koehn, Massachusetts Institute of Technology 3

  4. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Statistical Machine Translation p Components: Translation model, language model, decoder foreign/English English parallel text text statistical analysis statistical analysis Translation Language Model Model Decoding Algorithm – p.4 Philipp Koehn, Massachusetts Institute of Technology 4

  5. � � � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Phrase-Based Translation p Morgen fliege ich nach Kanada zur Konferenz Tomorrow I will fly to the conference in Canada Foreign input is segmented in phrases – any sequence of words, not necessarily linguistically motivated Each phrase is translated into English Phrases are reordered – p.5 Philipp Koehn, Massachusetts Institute of Technology 5

  6. � � � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Phrase-Based Systems p A number of research groups developed phrase-based systems ( RWTH Aachen, USC/ISI, CMU, IBM, JHU, ITC-irst, MIT, ... ) Systems differ in – training methods – model for phrase translation table – reordering models – additional feature functions Currently best method for SMT (MT?) – top systems in DARPA/NIST evaluation are phrase-based – best commercial system for Arabic-English is phrase-based – p.6 Philipp Koehn, Massachusetts Institute of Technology 6

  7. � � � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Pharaoh p Translation engine – works with various phrase-based models – beam search algorithm – time complexity roughly linear with input length – good quality takes about 1 second per sentence Very good performance in DARPA/NIST Evaluation Freely available for researchers http://www.isi.edu/licensed-sw/pharaoh/ – p.7 Philipp Koehn, Massachusetts Institute of Technology 7

  8. � � � � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Outline p Phrase-Based Statistical MT Beam Search Decoding Experiments Advanced Features – p.8 Philipp Koehn, Massachusetts Institute of Technology 8

  9. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Build translation left to right – select foreign words to be translated – p.9 Philipp Koehn, Massachusetts Institute of Technology 9

  10. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary Build translation left to right – select foreign words to be translated – find English phrase translation – add English phrase to end of partial translation – p.10 Philipp Koehn, Massachusetts Institute of Technology 10

  11. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary Build translation left to right – select foreign words to be translated – find English phrase translation – add English phrase to end of partial translation – mark foreign words as translated – p.11 Philipp Koehn, Massachusetts Institute of Technology 11

  12. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary did not One to many translation – p.12 Philipp Koehn, Massachusetts Institute of Technology 12

  13. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary did not slap Many to one translation – p.13 Philipp Koehn, Massachusetts Institute of Technology 13

  14. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary did not slap the Many to one translation – p.14 Philipp Koehn, Massachusetts Institute of Technology 14

  15. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary did not slap the green Reordering – p.15 Philipp Koehn, Massachusetts Institute of Technology 15

  16. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary did not slap the green witch Translation finished – p.16 Philipp Koehn, Massachusetts Institute of Technology 16

  17. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Translation Options p Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch Look up possible phrase translations – many different ways to segment words into phrases – many different ways to translate each phrase – p.17 Philipp Koehn, Massachusetts Institute of Technology 17

  18. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Hypothesis Expansion p Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: f: --------- p: 1 Start with null hypothesis – e: no English words – f: no foreign words covered – p: probability 1 – p.18 Philipp Koehn, Massachusetts Institute of Technology 18

  19. � � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Hypothesis Expansion p Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: e: Mary f: --------- f: *-------- p: 1 p: .534 Pick translation option Create hypothesis – e: add English phrase Mary – f: first foreign word covered – p: probability 0.534 – p.19 Philipp Koehn, Massachusetts Institute of Technology 19

  20. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Hypothesis Expansion p Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch f: -------*- p: .182 e: e: Mary f: --------- f: *-------- p: 1 p: .534 Add another hypothesis – p.20 Philipp Koehn, Massachusetts Institute of Technology 20

  21. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Hypothesis Expansion p Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch e: ... slap f: -------*- f: *-***---- p: .182 p: .043 e: e: Mary f: --------- f: *-------- p: 1 p: .534 Further hypothesis expansion – p.21 Philipp Koehn, Massachusetts Institute of Technology 21

  22. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Hypothesis Expansion p Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch e: slap f: -------*- f: *-***---- p: .182 p: .043 e: e: Mary e: did not e: slap e: the e:green witch f: --------- f: *-------- f: **------- f: *****---- f: *******-- f: ********* p: 1 p: .534 p: .154 p: .015 p: .004283 p: .000271 ... until all foreign words covered – find best hypothesis that covers all foreign words – backtrack to read off translation – p.22 Philipp Koehn, Massachusetts Institute of Technology 22

  23. � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Hypothesis Expansion p Maria no no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch e: slap f: -------*- f: *-***---- p: .182 p: .043 e: e: Mary e: did not e: slap e: the e:green witch f: --------- f: *-------- f: **------- f: *****---- f: *******-- f: ********* p: 1 p: .534 p: .154 p: .015 p: .004283 p: .000271 Adding more hypothesis Explosion of search space – p.23 Philipp Koehn, Massachusetts Institute of Technology 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend