Out of GIZAEfficient Word Alignment Models for SMT Yanjun Ma - PowerPoint PPT Presentation

Out of GIZA—Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza 1 / 28

Outline 1 Contexts 2 HMM and IBM Model 4 3 Improved HMM Alignment Models 4 Simultaneous Word Alignment and Phrase Extraction Y. Ma (DCU) Out of Giza 2 / 28

Word Alignment and SMT All SMT systems rely on word alignment Word-Based SMT Phrase-Based SMT Hiero, hierarchical SMT Syntax-Based SMT, i.e, tree-to-string, string-to-tree, tree-to-tree Y. Ma (DCU) Out of Giza 4 / 28

Word Alignment and SMT All SMT systems rely on word alignment Word-Based SMT Phrase-Based SMT Hiero, hierarchical SMT Syntax-Based SMT, i.e, tree-to-string, string-to-tree, tree-to-tree Giza implementation of IBM model 4 is dominant Y. Ma (DCU) Out of Giza 4 / 28

Word Alignment and SMT All SMT systems rely on word alignment Word-Based SMT Phrase-Based SMT Hiero, hierarchical SMT Syntax-Based SMT, i.e, tree-to-string, string-to-tree, tree-to-tree Giza implementation of IBM model 4 is dominant “Viterbi” alignment from IBM model 4 is used Y. Ma (DCU) Out of Giza 4 / 28

Efficient Model: HMM Model [Vogel et al., 1996] HMM emission (translation) model p ( t j | s a j ) Y. Ma (DCU) Out of Giza 6 / 28

HMM transition (alignment) model p ( a j | a j − a j − 1 ) Y. Ma (DCU) Out of Giza 7 / 28

HMM transition (alignment) model p ( a j | a j − a j − 1 ) � p ( t, a | s ) = p ( a j | a j − a j − 1 ) · p ( t j | s a j ) (1) j Y. Ma (DCU) Out of Giza 7 / 28

Deficient Model: IBM Model 3 and 4 Model 3: zero-order distortion model Y. Ma (DCU) Out of Giza 8 / 28

Model 4: first-order distortion model Y. Ma (DCU) Out of Giza 9 / 28

Derivation P ( t J 1 , a J 1 | s I P ( t J 1 , B I 0 | s I 1 ) = 1 ) (2) Y. Ma (DCU) Out of Giza 10 / 28

Model 3 fertility and distortion � p ( B i | B i − 1 , e i ) = p ( φ i | e i ) φ i ! p ( j | i, J ) (6) � �� j ∈ B i fertility distortion Y. Ma (DCU) Out of Giza 11 / 28

Model 3 fertility and distortion � p ( B i | B i − 1 , e i ) = p ( φ i | e i ) φ i ! p ( j | i, J ) (6) � �� j ∈ B i fertility distortion Model 4 fertility and distortion φ i � p ( B i | B i − 1 , e i ) = p ( φ i | e i ) p =1 ( B i 1 − B ρ ( i ) | · · · ) p > 1 ( B ik − B i,k − 1 | · · · ) (7) � �� k =2 fertility first word � �� remaining words Y. Ma (DCU) Out of Giza 11 / 28

Decoding HMM Viterbi decoding: ˆ a = argmax p ( a | s, t ) a Posterior decoding: Align point a j → i iff. p ( a j → i | s, t ) ≥ δ Y. Ma (DCU) Out of Giza 12 / 28

Decoding HMM Viterbi decoding: ˆ a = argmax p ( a | s, t ) a Posterior decoding: Align point a j → i iff. p ( a j → i | s, t ) ≥ δ IBM model 3 and 4 No efficient algorithm available Y. Ma (DCU) Out of Giza 12 / 28

Advantages of HMM models Efficient parameter estimation algorithm: forward-backward algorithm (Baum-Welch algorithm) Y. Ma (DCU) Out of Giza 13 / 28

Advantages of HMM models Efficient parameter estimation algorithm: forward-backward algorithm (Baum-Welch algorithm) Figure: Eric B. Baum (son of Leonard E. Baum, who was the inventor of the algorithm) and Lloyd R. Welch Y. Ma (DCU) Out of Giza 13 / 28

Advantages of HMM models Efficient parameter estimation algorithm: forward-backward algorithm (Baum-Welch algorithm) Figure: Eric B. Baum (son of Leonard E. Baum, who was the inventor of the algorithm) and Lloyd R. Welch The resulting posterior probabilities are useful Y. Ma (DCU) Out of Giza 13 / 28

Disadvantages of standard HMM models Objective is maximising the likelihood Y. Ma (DCU) Out of Giza 14 / 28

Disadvantages of standard HMM models Objective is maximising the likelihood There is no garantee that the optimised parameters correspond to more accurate alignments Y. Ma (DCU) Out of Giza 14 / 28

Disadvantages of standard HMM models Objective is maximising the likelihood There is no garantee that the optimised parameters correspond to more accurate alignments To complicate things (sometimes!) does help, e.g. IBM model 4 Y. Ma (DCU) Out of Giza 14 / 28

Improved HMM models Two more sophisticated HMM models Segmental HMM model, word-to-phrase alignment model Constrained HMM model, agreement-guided alignment model Y. Ma (DCU) Out of Giza 16 / 28

HMM Word-to-Phrase Alignment [Deng and Byrne, 2008] Introducing a segmentation model: segmental HMM Y. Ma (DCU) Out of Giza 17 / 28

P ( v K 1 , K, a K 1 , h K 1 , φ K P ( t, a | s ) = 1 | s ) (8) Y. Ma (DCU) Out of Giza 18 / 28

P ( v K 1 , K, a K 1 , h K 1 , φ K P ( t, a | s ) = 1 | s ) (8) = P ( K | J, s ) (9) � �� segmentation Y. Ma (DCU) Out of Giza 18 / 28

P ( v K 1 , K, a K 1 , h K 1 , φ K P ( t, a | s ) = 1 | s ) (8) = P ( K | J, s ) (9) � �� segmentation × P ( a K 1 , φ K 1 , h K 1 | K, J, s ) (10) � �� alignment-fertility Y. Ma (DCU) Out of Giza 18 / 28

P ( v K 1 , K, a K 1 , h K 1 , φ K P ( t, a | s ) = 1 | s ) (8) = P ( K | J, s ) (9) � �� segmentation × P ( a K 1 , φ K 1 , h K 1 | K, J, s ) (10) � �� alignment-fertility × P ( v K 1 | a K 1 , φ K 1 , h K 1 , K, J, s ) (11) � �� translation Y. Ma (DCU) Out of Giza 18 / 28

HMM Word-to-Phrase Alignment K � P ( a K 1 , φ K 1 , h K 1 | K, J, s ) = P ( a k , h k , φ k | a k − 1 , φ k − 1 , h k − 1 , K, J, s ) (12) k =1 Y. Ma (DCU) Out of Giza 19 / 28

HMM Word-to-Phrase Alignment K � P ( a K 1 , φ K 1 , h K 1 | K, J, s ) = P ( a k , h k , φ k | a k − 1 , φ k − 1 , h k − 1 , K, J, s ) (12) k =1 K � = p ( a k , | a k − 1 , h k ; I ) · d ( h k ) · n ( φ k ; s a k ) (13) � �� k =1 alignment null alignment fertility Y. Ma (DCU) Out of Giza 19 / 28

Performance of HMM Word-to-Phrase Alignment MTTK implementation Y. Ma (DCU) Out of Giza 20 / 28

Performance of HMM Word-to-Phrase Alignment MTTK implementation Used by Cambridge University Engineering Department Arabic–English NIST 2008 (6th out of 16, third best university participant, behind LIUM and ISI) Consistent performance for Chinese–English for differently sized collections of corpus Parallelised to handle large amount of data (e.g. 10M sentence pairs) Y. Ma (DCU) Out of Giza 20 / 28

Agreement Constrained HMM Alignment [Ganchev et al., 2008] Objective argmin { KL ( q ( a ) || p θ ( a | s, t )) } s.t. E q [ f ( s, t, a )] ≤ b (14) q ( a ) ∈ ( Q ) Y. Ma (DCU) Out of Giza 21 / 28

Agreement Constrained HMM Alignment [Ganchev et al., 2008] Objective argmin { KL ( q ( a ) || p θ ( a | s, t )) } s.t. E q [ f ( s, t, a )] ≤ b (14) q ( a ) ∈ ( Q ) Figure: − → p θ ( a | s, t ), ← − p θ ( a | s, t ) and − → q ( a ), ← − q ( a ) Y. Ma (DCU) Out of Giza 21 / 28

Agreement Constrained HMM Alignment [Ganchev et al., 2008] Constrained E(M) Y. Ma (DCU) Out of Giza 22 / 28

Performance of Agreement Constrained HMM PostCAT implementation Evaluation Six language pairs, from 100,000 to 1M sentence pairs Outperform IBM Model 4 (16 out 18 times) However... getting slightly worse when the training data is over 1M Y. Ma (DCU) Out of Giza 23 / 28

Out of GIZAEfficient Word Alignment Models for SMT Yanjun Ma - PowerPoint PPT Presentation

Out of GIZAEfficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza 1 / 28 Outline 1 Contexts 2 HMM

Discriminative word alignment by learning the Discriminative word alignment by learning the

Yices 1.0: An Efficient SMT Solver SMT-COMP06 Leonardo de Moura (joint work with Bruno

Yices 1.0: An Efficient SMT Solver SMT-COMP06 Leonardo de Moura joint work with Bruno

Machine Translation: Word Alignment Problem Marcello Federico FBK, Trento - Italy 2013 M.

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

SMT WORLDWIDE SMT America, Europe and Asia staff has over 20 years experience in the SMT field

POLYMETALLIC PRODUCER AGM PRESENTATION June 30, 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL: SMT

SMT Solvers: A Disruptive Technology John Rushby Computer Science Laboratory SRI International

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 & angr

Statistical Machine Translation Overview p EM algorithm Lecture 3 Improved word alignment

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Compilers Alignment Alex Aiken Alignment Most modern machines are 32 or 64 bit 8 bits in

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Standard and Natural Policy Gradients for Discounted Rewards Aaron Mishkin August 8, 2020 UBC

Stirling Alloa Kincardine Railway Brief History of the Route 1850 1968 Alloa

Bisimilarities Induced by Relations on Home Page Actions Title Page S.

The Epoch of Disk Settling: z ~ 1 to Today Susan Kassin (NPP Fellow, NASA Goddard), Ben Weiner

Anisotropic Long Range Spin Systems Nicol` o Defenu Scuola Internazionale Superiore di Studi

Multilinear maps from lattices Constructions, attacks, and applications Yilei Chen (Visa

AITP Components Cezary Kaliszyk 03 April 2016 University of Innsbruck, Austria Talk Overview

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Out of GIZAEfficient Word Alignment Models for SMT Yanjun Ma - PowerPoint PPT Presentation

Out of GIZAEfficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza 1 / 28 Outline 1 Contexts 2 HMM

Discriminative word alignment by learning the Discriminative word alignment by learning the

Yices 1.0: An Efficient SMT Solver SMT-COMP06 Leonardo de Moura (joint work with Bruno

Yices 1.0: An Efficient SMT Solver SMT-COMP06 Leonardo de Moura joint work with Bruno

Machine Translation: Word Alignment Problem Marcello Federico FBK, Trento - Italy 2013 M.

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

SMT WORLDWIDE SMT America, Europe and Asia staff has over 20 years experience in the SMT field

POLYMETALLIC PRODUCER AGM PRESENTATION June 30, 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL: SMT

SMT Solvers: A Disruptive Technology John Rushby Computer Science Laboratory SRI International

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 &amp; angr

Statistical Machine Translation Overview p EM algorithm Lecture 3 Improved word alignment

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Compilers Alignment Alex Aiken Alignment Most modern machines are 32 or 64 bit 8 bits in

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Standard and Natural Policy Gradients for Discounted Rewards Aaron Mishkin August 8, 2020 UBC

Stirling Alloa Kincardine Railway Brief History of the Route 1850 1968 Alloa

Bisimilarities Induced by Relations on Home Page Actions Title Page S.

The Epoch of Disk Settling: z ~ 1 to Today Susan Kassin (NPP Fellow, NASA Goddard), Ben Weiner

Anisotropic Long Range Spin Systems Nicol` o Defenu Scuola Internazionale Superiore di Studi

Multilinear maps from lattices Constructions, attacks, and applications Yilei Chen (Visa

AITP Components Cezary Kaliszyk 03 April 2016 University of Innsbruck, Austria Talk Overview

Dialogue systems &amp; chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 & angr

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)