Natural Language Processing (CSE 517): Machine Translation Noah - PowerPoint PPT Presentation

Natural Language Processing (CSE 517): Machine Translation Noah Smith � 2018 c University of Washington nasmith@cs.washington.edu May 23, 2018 1 / 82

Evaluation Intuition: good translations are fluent in the target language and faithful to the original meaning. Bleu score (Papineni et al., 2002): ◮ Compare to a human-generated reference translation ◮ Or, better: multiple references ◮ Weighted average of n-gram precision (across different n) There are some alternatives; most papers that use them report Bleu, too. 2 / 82

Warren Weaver to Norbert Wiener, 1947 One naturally wonders if the problem of translation could be conceivably treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ 3 / 82

Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X 4 / 82

Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output 5 / 82

Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output ◮ X is the ciphertext, the garbled message, the observable evidence, the input 6 / 82

Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output ◮ X is the ciphertext, the garbled message, the observable evidence, the input ◮ Decoding: select y given X = x . y ∗ = argmax p ( y | x ) y p ( x | y ) · p ( y ) = argmax p ( x ) y = argmax p ( x | y ) · p ( y ) y � �� channel model source model 7 / 82

Bitext/Parallel Text Let f and e be two sequences in V † (French) and ¯ V † (English), respectively. Earlier, we defined p ( F | e ) , the probability over French translations of English sentence e (IBM Models 1 and 2). In a noisy channel machine translation system, we could use this together with source/language model p ( e ) to “decode” f into an English translation. Where does the data to estimate this come from? 8 / 82

IBM Model 1 (Brown et al., 1993) Let ℓ and m be the (known) lengths of e and f . Latent variable a = � a 1 , . . . , a m � , each a i ranging over { 0 , . . . , ℓ } (positions in e ). ◮ a 4 = 3 means that f 4 is “aligned” to e 3 . ◮ a 6 = 0 means that f 6 is “aligned” to a special null symbol, e 0 . ℓ ℓ ℓ � � � p ( f | e , m ) = · · · p ( f , a | e , m ) a 1 =0 a 2 =0 a m =0 � = p ( f , a | e , m ) a ∈{ 0 ,...,ℓ } m m � p ( f , a | e , m ) = p ( a i | i, ℓ, m ) · p ( f i | e a i ) i =1 m � m m � 1 1 � � = ℓ + 1 · θ f i | e ai = θ f i | e ai ℓ + 1 i =1 i =1 9 / 82

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , . . . � 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s 10 / 82

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 11 / 82

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 · 17 + 1 · θ war | was 12 / 82

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 13 / 82

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 · 17 + 1 · θ voller | filled 14 / 82

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , ? , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 1 · 17 + 1 · θ voller | filled · 17 + 1 · θ Productionsfactoren | ? 15 / 82

Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , ? , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 1 · 17 + 1 · θ voller | filled · 17 + 1 · θ Productionsfactoren | ? Problem: This alignment isn’t possible with IBM Model 1! Each f i is aligned to at most one e a i ! 16 / 82

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , . . . � 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null 17 / 82

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 · 10 + 1 · θ , | null 18 / 82

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 19 / 82

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 · 10 + 1 · θ ark | Arche 20 / 82

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 21 / 82

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , 5 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 1 · 10 + 1 · θ filled | voller 22 / 82

Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , 5 , 4 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 1 1 · 10 + 1 · θ filled | voller · 10 + 1 · θ not | nicht 23 / 82

How to Estimate Translation Distributions? This is a problem of incomplete data : at training time, we see e and f , but not a . 24 / 82

How to Estimate Translation Distributions? This is a problem of incomplete data : at training time, we see e and f , but not a . Classical solution is to alternate : ◮ Given a parameter estimate for θ , align the words. ◮ Given aligned words, re-estimate θ . Traditional approach uses “soft” alignment. 25 / 82

IBM Models 1 and 2, Depicted hidden IBM 1 y 1 y 2 y 3 y 4 a 1 a 2 a 3 a 4 Markov and 2 model e e e e x 1 x 2 x 3 x 4 f 1 f 2 f 3 f 4 26 / 82

Natural Language Processing (CSE 517): Machine Translation Noah - PowerPoint PPT Presentation

Natural Language Processing (CSE 517): Machine Translation Noah Smith 2018 c University of Washington nasmith@cs.washington.edu May 23, 2018 1 / 82 Evaluation Intuition: good translations are fluent in the target language and faithful to

CSE 517 Natural Language Processing Winter 2017 Machine Translation Yejin Choi Slides from Dan

CSEP 517 Natural Language Processing Luke Zettlemoyer Machine Translation, Sequence-to-sequence

CSP 517 Natural Language Processing Winter 2015 Machine Translation: Word Alignment Yejin Choi

Natural Language Processing Machine Translation Machine Translation Dan Klein UC Berkeley

Natural Language Processing Machine Translation Dan Klein UC Berkeley 1 Machine Translation 2

CSE 517 Natural Language Processing Winter 2013 Syntax-Based Translation Luke Zettlemoyer

CSE 517 Natural Language Processing Winter 2015 Phrase Based Translation Yejin Choi Slides

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &

Natural Language Processing (CSEP 517): Machine Translation Noah Smith 2017 c University of

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Performance Analysis Superpowers with Linux BPF Brendan Gregg Sep 2017 bcc/BPF tools DEMO

Futures and Promises: Lessons in Concurrency Learned at Tumblr QCon NY 2012 Tuesday, June 19, 12

ASYNC PROGRAMMING What does this print? function getY() { var y; $http.get(/gety,

Pyramidal Stochastic Graphlet Embedding for Document Pattern Classification Anjan Dutta , Pau

Neuropalliative Case Laura Koehn Assistant Professor Neurology and Palliative Care

TF NOC TF-NOC About TERENA TERENA offers a forum to collaborate innovate and TERENA

Advocating for a world with #NOhep WHA 258 members 86 countries Our first 10 years 2010 2016

Human Subjects in Research Nicole Farnese-McFarlane Sr. Compliance Coordinator Research Office

Natural Language Processing (CSE 517): Machine Translation Noah - PowerPoint PPT Presentation

Natural Language Processing (CSE 517): Machine Translation Noah Smith 2018 c University of Washington nasmith@cs.washington.edu May 23, 2018 1 / 82 Evaluation Intuition: good translations are fluent in the target language and faithful to

CSE 517 Natural Language Processing Winter 2017 Machine Translation Yejin Choi Slides from Dan

CSEP 517 Natural Language Processing Luke Zettlemoyer Machine Translation, Sequence-to-sequence

CSP 517 Natural Language Processing Winter 2015 Machine Translation: Word Alignment Yejin Choi

Natural Language Processing Machine Translation Machine Translation Dan Klein UC Berkeley

Natural Language Processing Machine Translation Dan Klein UC Berkeley 1 Machine Translation 2

CSE 517 Natural Language Processing Winter 2013 Syntax-Based Translation Luke Zettlemoyer

CSE 517 Natural Language Processing Winter 2015 Phrase Based Translation Yejin Choi Slides

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &amp;

Natural Language Processing (CSEP 517): Machine Translation Noah Smith 2017 c University of

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Performance Analysis Superpowers with Linux BPF Brendan Gregg Sep 2017 bcc/BPF tools DEMO

Futures and Promises: Lessons in Concurrency Learned at Tumblr QCon NY 2012 Tuesday, June 19, 12

ASYNC PROGRAMMING What does this print? function getY() { var y; $http.get(/gety,

Pyramidal Stochastic Graphlet Embedding for Document Pattern Classification Anjan Dutta , Pau

Neuropalliative Case Laura Koehn Assistant Professor Neurology and Palliative Care

TF NOC TF-NOC About TERENA TERENA offers a forum to collaborate innovate and TERENA

Advocating for a world with #NOhep WHA 258 members 86 countries Our first 10 years 2010 2016

Human Subjects in Research Nicole Farnese-McFarlane Sr. Compliance Coordinator Research Office

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, &