natural language processing cse 517 machine translation
play

Natural Language Processing (CSE 517): Machine Translation Noah - PowerPoint PPT Presentation

Natural Language Processing (CSE 517): Machine Translation Noah Smith 2018 c University of Washington nasmith@cs.washington.edu May 23, 2018 1 / 82 Evaluation Intuition: good translations are fluent in the target language and faithful to


  1. Natural Language Processing (CSE 517): Machine Translation Noah Smith � 2018 c University of Washington nasmith@cs.washington.edu May 23, 2018 1 / 82

  2. Evaluation Intuition: good translations are fluent in the target language and faithful to the original meaning. Bleu score (Papineni et al., 2002): ◮ Compare to a human-generated reference translation ◮ Or, better: multiple references ◮ Weighted average of n-gram precision (across different n) There are some alternatives; most papers that use them report Bleu, too. 2 / 82

  3. Warren Weaver to Norbert Wiener, 1947 One naturally wonders if the problem of translation could be conceivably treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ 3 / 82

  4. Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X 4 / 82

  5. Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output 5 / 82

  6. Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output ◮ X is the ciphertext, the garbled message, the observable evidence, the input 6 / 82

  7. Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output ◮ X is the ciphertext, the garbled message, the observable evidence, the input ◮ Decoding: select y given X = x . y ∗ = argmax p ( y | x ) y p ( x | y ) · p ( y ) = argmax p ( x ) y = argmax p ( x | y ) · p ( y ) y � �� � ���� channel model source model 7 / 82

  8. Bitext/Parallel Text Let f and e be two sequences in V † (French) and ¯ V † (English), respectively. Earlier, we defined p ( F | e ) , the probability over French translations of English sentence e (IBM Models 1 and 2). In a noisy channel machine translation system, we could use this together with source/language model p ( e ) to “decode” f into an English translation. Where does the data to estimate this come from? 8 / 82

  9. IBM Model 1 (Brown et al., 1993) Let ℓ and m be the (known) lengths of e and f . Latent variable a = � a 1 , . . . , a m � , each a i ranging over { 0 , . . . , ℓ } (positions in e ). ◮ a 4 = 3 means that f 4 is “aligned” to e 3 . ◮ a 6 = 0 means that f 6 is “aligned” to a special null symbol, e 0 . ℓ ℓ ℓ � � � p ( f | e , m ) = · · · p ( f , a | e , m ) a 1 =0 a 2 =0 a m =0 � = p ( f , a | e , m ) a ∈{ 0 ,...,ℓ } m m � p ( f , a | e , m ) = p ( a i | i, ℓ, m ) · p ( f i | e a i ) i =1 m � m m � 1 1 � � = ℓ + 1 · θ f i | e ai = θ f i | e ai ℓ + 1 i =1 i =1 9 / 82

  10. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , . . . � 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s 10 / 82

  11. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 11 / 82

  12. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 · 17 + 1 · θ war | was 12 / 82

  13. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 13 / 82

  14. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 · 17 + 1 · θ voller | filled 14 / 82

  15. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , ? , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 1 · 17 + 1 · θ voller | filled · 17 + 1 · θ Productionsfactoren | ? 15 / 82

  16. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , ? , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 1 · 17 + 1 · θ voller | filled · 17 + 1 · θ Productionsfactoren | ? Problem: This alignment isn’t possible with IBM Model 1! Each f i is aligned to at most one e a i ! 16 / 82

  17. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , . . . � 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null 17 / 82

  18. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 · 10 + 1 · θ , | null 18 / 82

  19. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 19 / 82

  20. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 · 10 + 1 · θ ark | Arche 20 / 82

  21. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 21 / 82

  22. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , 5 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 1 · 10 + 1 · θ filled | voller 22 / 82

  23. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , 5 , 4 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 1 1 · 10 + 1 · θ filled | voller · 10 + 1 · θ not | nicht 23 / 82

  24. How to Estimate Translation Distributions? This is a problem of incomplete data : at training time, we see e and f , but not a . 24 / 82

  25. How to Estimate Translation Distributions? This is a problem of incomplete data : at training time, we see e and f , but not a . Classical solution is to alternate : ◮ Given a parameter estimate for θ , align the words. ◮ Given aligned words, re-estimate θ . Traditional approach uses “soft” alignment. 25 / 82

  26. IBM Models 1 and 2, Depicted hidden IBM 1 y 1 y 2 y 3 y 4 a 1 a 2 a 3 a 4 Markov and 2 model e e e e x 1 x 2 x 3 x 4 f 1 f 2 f 3 f 4 26 / 82

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend