natural language processing csep 517 machine translation
play

Natural Language Processing (CSEP 517): Machine Translation Noah - PowerPoint PPT Presentation

Natural Language Processing (CSEP 517): Machine Translation Noah Smith 2017 c University of Washington nasmith@cs.washington.edu May 15, 2017 1 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008, ch. 25), Collins


  1. Natural Language Processing (CSEP 517): Machine Translation Noah Smith � 2017 c University of Washington nasmith@cs.washington.edu May 15, 2017 1 / 59

  2. To-Do List ◮ Online quiz: due Sunday ◮ (Jurafsky and Martin, 2008, ch. 25), Collins (2011, 2013) ◮ A5 due May 28 (Sunday) 2 / 59

  3. Evaluation Intuition: good translations are fluent in the target language and faithful to the original meaning. Bleu score (Papineni et al., 2002): ◮ Compare to a human-generated reference translation ◮ Or, better: multiple references ◮ Weighted average of n-gram precision (across different n) There are some alternatives; most papers that use them report Bleu, too. 3 / 59

  4. Warren Weaver to Norbert Wiener, 1947 One naturally wonders if the problem of translation could be conceivably treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ 4 / 59

  5. Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X 5 / 59

  6. Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output 6 / 59

  7. Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output ◮ X is the ciphertext, the garbled message, the observable evidence, the input 7 / 59

  8. Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output ◮ X is the ciphertext, the garbled message, the observable evidence, the input ◮ Decoding: select y given X = x . y ∗ = argmax p ( y | x ) y p ( x | y ) · p ( y ) = argmax p ( x ) y = argmax p ( x | y ) · p ( y ) y � �� � ���� channel model source model 8 / 59

  9. Bitext/Parallel Text Let f and e be two sequences in V † (French) and ¯ V † (English), respectively. We’re going to define p ( F | e ) , the probability over French translations of English sentence e . In a noisy channel machine translation system, we could use this together with source/language model p ( e ) to “decode” f into an English translation. Where does the data to estimate this come from? 9 / 59

  10. IBM Model 1 (Brown et al., 1993) Let ℓ and m be the (known) lengths of e and f . Latent variable a = � a 1 , . . . , a m � , each a i ranging over { 0 , . . . , ℓ } (positions in e ). ◮ a 4 = 3 means that f 4 is “aligned” to e 3 . ◮ a 6 = 0 means that f 6 is “aligned” to a special null symbol, e 0 . ℓ ℓ ℓ � � � p ( f | e , m ) = · · · p ( f , a | e , m ) a 1 =0 a 2 =0 a m =0 � = p ( f , a | e , m ) a ∈{ 0 ,...,ℓ } m m � p ( f , a | e , m ) = p ( a i | i, ℓ, m ) · p ( f i | e a i ) i =1 m � m m � 1 1 � � = ℓ + 1 · θ f i | e ai = θ f i | e ai ℓ + 1 i =1 i =1 10 / 59

  11. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , . . . � 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s 11 / 59

  12. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 12 / 59

  13. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 · 17 + 1 · θ war | was 13 / 59

  14. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 14 / 59

  15. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 · 17 + 1 · θ voller | filled 15 / 59

  16. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , ? , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 1 · 17 + 1 · θ voller | filled · 17 + 1 · θ Productionsfactoren | ? 16 / 59

  17. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , ? , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 1 · 17 + 1 · θ voller | filled · 17 + 1 · θ Productionsfactoren | ? Problem: This alignment isn’t possible with IBM Model 1! Each f i is aligned to at most one e a i ! 17 / 59

  18. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , . . . � 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null 18 / 59

  19. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 · 10 + 1 · θ , | null 19 / 59

  20. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 20 / 59

  21. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 · 10 + 1 · θ ark | Arche 21 / 59

  22. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 22 / 59

  23. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , 5 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 1 · 10 + 1 · θ filled | voller 23 / 59

  24. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , 5 , 4 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 1 1 · 10 + 1 · θ filled | voller · 10 + 1 · θ not | nicht 24 / 59

  25. How to Estimate Translation Distributions? This is a problem of incomplete data : at training time, we see e and f , but not a . 25 / 59

  26. How to Estimate Translation Distributions? This is a problem of incomplete data : at training time, we see e and f , but not a . Classical solution is to alternate : ◮ Given a parameter estimate for θ , align the words. ◮ Given aligned words, re-estimate θ . Traditional approach uses “soft” alignment. 26 / 59

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend