Natural Language Processing (CSEP 517): Machine Translation Noah - PowerPoint PPT Presentation

Natural Language Processing (CSEP 517): Machine Translation Noah Smith � 2017 c University of Washington nasmith@cs.washington.edu May 15, 2017 1 / 59

To-Do List ◮ Online quiz: due Sunday ◮ (Jurafsky and Martin, 2008, ch. 25), Collins (2011, 2013) ◮ A5 due May 28 (Sunday) 2 / 59

Evaluation Intuition: good translations are fluent in the target language and faithful to the original meaning. Bleu score (Papineni et al., 2002): ◮ Compare to a human-generated reference translation ◮ Or, better: multiple references ◮ Weighted average of n-gram precision (across different n) There are some alternatives; most papers that use them report Bleu, too. 3 / 59

Warren Weaver to Norbert Wiener, 1947 One naturally wonders if the problem of translation could be conceivably treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ 4 / 59

Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X 5 / 59

Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output 6 / 59

Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output ◮ X is the ciphertext, the garbled message, the observable evidence, the input 7 / 59

Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output ◮ X is the ciphertext, the garbled message, the observable evidence, the input ◮ Decoding: select y given X = x . y ∗ = argmax p ( y | x ) y p ( x | y ) · p ( y ) = argmax p ( x ) y = argmax p ( x | y ) · p ( y ) y � �� channel model source model 8 / 59

Bitext/Parallel Text Let f and e be two sequences in V † (French) and ¯ V † (English), respectively. We’re going to define p ( F | e ) , the probability over French translations of English sentence e . In a noisy channel machine translation system, we could use this together with source/language model p ( e ) to “decode” f into an English translation. Where does the data to estimate this come from? 9 / 59

IBM Model 1 (Brown et al., 1993) Let ℓ and m be the (known) lengths of e and f . Latent variable a = � a 1 , . . . , a m � , each a i ranging over { 0 , . . . , ℓ } (positions in e ). ◮ a 4 = 3 means that f 4 is “aligned” to e 3 . ◮ a 6 = 0 means that f 6 is “aligned” to a special null symbol, e 0 . ℓ ℓ ℓ � � � p ( f | e , m ) = · · · p ( f , a | e , m ) a 1 =0 a 2 =0 a m =0 � = p ( f , a | e , m ) a ∈{ 0 ,...,ℓ } m m � p ( f , a | e , m ) = p ( a i | i, ℓ, m ) · p ( f i | e a i ) i =1 m � m m � 1 1 � � = ℓ + 1 · θ f i | e ai = θ f i | e ai ℓ + 1 i =1 i =1 10 / 59

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , . . . � 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s 11 / 59

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 12 / 59

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 · 17 + 1 · θ war | was 13 / 59

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 14 / 59

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 · 17 + 1 · θ voller | filled 15 / 59

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , ? , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 1 · 17 + 1 · θ voller | filled · 17 + 1 · θ Productionsfactoren | ? 16 / 59

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , ? , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 1 · 17 + 1 · θ voller | filled · 17 + 1 · θ Productionsfactoren | ? Problem: This alignment isn’t possible with IBM Model 1! Each f i is aligned to at most one e a i ! 17 / 59

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , . . . � 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null 18 / 59

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 · 10 + 1 · θ , | null 19 / 59

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 20 / 59

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 · 10 + 1 · θ ark | Arche 21 / 59

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 22 / 59

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , 5 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 1 · 10 + 1 · θ filled | voller 23 / 59

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , 5 , 4 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 1 1 · 10 + 1 · θ filled | voller · 10 + 1 · θ not | nicht 24 / 59

How to Estimate Translation Distributions? This is a problem of incomplete data : at training time, we see e and f , but not a . 25 / 59

How to Estimate Translation Distributions? This is a problem of incomplete data : at training time, we see e and f , but not a . Classical solution is to alternate : ◮ Given a parameter estimate for θ , align the words. ◮ Given aligned words, re-estimate θ . Traditional approach uses “soft” alignment. 26 / 59

Natural Language Processing (CSEP 517): Machine Translation Noah - PowerPoint PPT Presentation

Natural Language Processing (CSEP 517): Machine Translation Noah Smith 2017 c University of Washington nasmith@cs.washington.edu May 15, 2017 1 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008, ch. 25), Collins

CSEP 517 Natural Language Processing Luke Zettlemoyer Machine Translation, Sequence-to-sequence

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &

CSEP 517 Natural Language Processing Language Models Luke Zettlemoyer Slides adapted from Dan

CSEP 517 Natural Language Processing Introduction Luke Zettlemoyer Slides adapted from Dan

CSEP 517: Natural Language Processing New PMP Course! Instructor: Luke Zettlemoyer Autumn 2013

CSEP 517 Natural Language Processing Autumn 2018 Introduction Luke Zettlemoyer Slides adapted

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Natural Language Processing (CSEP 517): Introduction & Language Models Noah Smith c 2017

CSP 517 Natural Language Processing Winter 2015 Machine Translation: Word Alignment Yejin Choi

CSEP 517 Natural Language Processing Autumn 2015 Parsing (Trees) Yejin Choi - University of

CSEP 517 Natural Language Processing Frame Semantics Luke Zettlemoyer Slides adapted from Yejin

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

CSEP 517 Natural Language Processing Autumn 2015 Introduction Yejin Choi Slides adapted

Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz 2017 c

CSEP 517 Natural Language Processing Coreference Resolution Luke Zettlemoyer University of

Natural Language Processing (CSEP 517): Dependency Syntax and Parsing Noah Smith 2017 c

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn Machine Translation:

Gaussian Approximation of Quantization Error for Inference from Compressed Data Alon Kipnis

Lightning Introductions Digital Computing Beyond Moores Law May 3-4, 2018 Sarita

World Scout Badge World Scout Badge History of Scouting History of Scouting in Singapore

Coordinator Orientation July 15, 2020 Jennifer Norton and Anika Harris Welcome and Introductions

Decentralized Proof-Term Library Michael Nahas affliliated with Radboud Universiteit Nijmegen

Multilingual Entity Linking: Comparing English and Spanish Henry Rosales-M endez, Barbara

Identification of Transliterated Foreign Words in Hebrew Script Yoav Goldberg Michael Elhadad

Natural Language Processing (CSEP 517): Machine Translation Noah - PowerPoint PPT Presentation

Natural Language Processing (CSEP 517): Machine Translation Noah Smith 2017 c University of Washington nasmith@cs.washington.edu May 15, 2017 1 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008, ch. 25), Collins

CSEP 517 Natural Language Processing Luke Zettlemoyer Machine Translation, Sequence-to-sequence

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &amp;

CSEP 517 Natural Language Processing Language Models Luke Zettlemoyer Slides adapted from Dan

CSEP 517 Natural Language Processing Introduction Luke Zettlemoyer Slides adapted from Dan

CSEP 517: Natural Language Processing New PMP Course! Instructor: Luke Zettlemoyer Autumn 2013

CSEP 517 Natural Language Processing Autumn 2018 Introduction Luke Zettlemoyer Slides adapted

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Natural Language Processing (CSEP 517): Introduction &amp; Language Models Noah Smith c 2017

CSP 517 Natural Language Processing Winter 2015 Machine Translation: Word Alignment Yejin Choi

CSEP 517 Natural Language Processing Autumn 2015 Parsing (Trees) Yejin Choi - University of

CSEP 517 Natural Language Processing Frame Semantics Luke Zettlemoyer Slides adapted from Yejin

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

CSEP 517 Natural Language Processing Autumn 2015 Introduction Yejin Choi Slides adapted

Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz 2017 c

CSEP 517 Natural Language Processing Coreference Resolution Luke Zettlemoyer University of

Natural Language Processing (CSEP 517): Dependency Syntax and Parsing Noah Smith 2017 c

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn Machine Translation:

Gaussian Approximation of Quantization Error for Inference from Compressed Data Alon Kipnis

Lightning Introductions Digital Computing Beyond Moores Law May 3-4, 2018 Sarita

World Scout Badge World Scout Badge History of Scouting History of Scouting in Singapore

Coordinator Orientation July 15, 2020 Jennifer Norton and Anika Harris Welcome and Introductions

Decentralized Proof-Term Library Michael Nahas affliliated with Radboud Universiteit Nijmegen

Multilingual Entity Linking: Comparing English and Spanish Henry Rosales-M endez, Barbara

Identification of Transliterated Foreign Words in Hebrew Script Yoav Goldberg Michael Elhadad

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &

Natural Language Processing (CSEP 517): Introduction & Language Models Noah Smith c 2017