out of giza efficient word alignment models for smt
play

Out of GIZAEfficient Word Alignment Models for SMT Yanjun Ma - PowerPoint PPT Presentation

Out of GIZAEfficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza 1 / 28 Outline 1 Contexts 2 HMM


  1. Out of GIZA—Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza 1 / 28

  2. Outline 1 Contexts 2 HMM and IBM Model 4 3 Improved HMM Alignment Models 4 Simultaneous Word Alignment and Phrase Extraction Y. Ma (DCU) Out of Giza 2 / 28

  3. Outline 1 Contexts 2 HMM and IBM Model 4 3 Improved HMM Alignment Models 4 Simultaneous Word Alignment and Phrase Extraction Y. Ma (DCU) Out of Giza 3 / 28

  4. Word Alignment and SMT All SMT systems rely on word alignment Word-Based SMT Phrase-Based SMT Hiero, hierarchical SMT Syntax-Based SMT, i.e, tree-to-string, string-to-tree, tree-to-tree Y. Ma (DCU) Out of Giza 4 / 28

  5. Word Alignment and SMT All SMT systems rely on word alignment Word-Based SMT Phrase-Based SMT Hiero, hierarchical SMT Syntax-Based SMT, i.e, tree-to-string, string-to-tree, tree-to-tree Giza implementation of IBM model 4 is dominant Y. Ma (DCU) Out of Giza 4 / 28

  6. Word Alignment and SMT All SMT systems rely on word alignment Word-Based SMT Phrase-Based SMT Hiero, hierarchical SMT Syntax-Based SMT, i.e, tree-to-string, string-to-tree, tree-to-tree Giza implementation of IBM model 4 is dominant “Viterbi” alignment from IBM model 4 is used Y. Ma (DCU) Out of Giza 4 / 28

  7. Outline 1 Contexts 2 HMM and IBM Model 4 3 Improved HMM Alignment Models 4 Simultaneous Word Alignment and Phrase Extraction Y. Ma (DCU) Out of Giza 5 / 28

  8. Efficient Model: HMM Model [Vogel et al., 1996] HMM emission (translation) model p ( t j | s a j ) Y. Ma (DCU) Out of Giza 6 / 28

  9. HMM transition (alignment) model p ( a j | a j − a j − 1 ) Y. Ma (DCU) Out of Giza 7 / 28

  10. HMM transition (alignment) model p ( a j | a j − a j − 1 ) � p ( t, a | s ) = p ( a j | a j − a j − 1 ) · p ( t j | s a j ) (1) j Y. Ma (DCU) Out of Giza 7 / 28

  11. Deficient Model: IBM Model 3 and 4 Model 3: zero-order distortion model Y. Ma (DCU) Out of Giza 8 / 28

  12. Model 4: first-order distortion model Y. Ma (DCU) Out of Giza 9 / 28

  13. Derivation P ( t J 1 , a J 1 | s I P ( t J 1 , B I 0 | s I 1 ) = 1 ) (2) Y. Ma (DCU) Out of Giza 10 / 28

  14. Derivation P ( t J 1 , a J 1 | s I P ( t J 1 , B I 0 | s I 1 ) = 1 ) (2) I � P ( B i | B i − 1 P ( B 0 | B I , e I 1 ) × P ( f J 1 | B I 0 , e I = 1 ) × 1 ) (3) 1 i =1 Y. Ma (DCU) Out of Giza 10 / 28

  15. Derivation P ( t J 1 , a J 1 | s I P ( t J 1 , B I 0 | s I 1 ) = 1 ) (2) I � P ( B i | B i − 1 P ( B 0 | B I , e I 1 ) × P ( f J 1 | B I 0 , e I = 1 ) × 1 ) (3) 1 i =1 I � P ( B 0 | B I = 1 ) × p ( B i | B i − 1 , e i ) (4) � �� � i =1 fertility-distortion Y. Ma (DCU) Out of Giza 10 / 28

  16. Derivation P ( t J 1 , a J 1 | s I P ( t J 1 , B I 0 | s I 1 ) = 1 ) (2) I � P ( B i | B i − 1 P ( B 0 | B I , e I 1 ) × P ( f J 1 | B I 0 , e I = 1 ) × 1 ) (3) 1 i =1 I � P ( B 0 | B I = 1 ) × p ( B i | B i − 1 , e i ) (4) � �� � i =1 fertility-distortion I � � × p ( f j | e i ) (5) � �� � i =0 j ∈ B i translation Y. Ma (DCU) Out of Giza 10 / 28

  17. Model 3 fertility and distortion � p ( B i | B i − 1 , e i ) = p ( φ i | e i ) φ i ! p ( j | i, J ) (6) � �� � � �� � j ∈ B i fertility distortion Y. Ma (DCU) Out of Giza 11 / 28

  18. Model 3 fertility and distortion � p ( B i | B i − 1 , e i ) = p ( φ i | e i ) φ i ! p ( j | i, J ) (6) � �� � � �� � j ∈ B i fertility distortion Model 4 fertility and distortion φ i � p ( B i | B i − 1 , e i ) = p ( φ i | e i ) p =1 ( B i 1 − B ρ ( i ) | · · · ) p > 1 ( B ik − B i,k − 1 | · · · ) (7) � �� � � �� � k =2 fertility first word � �� � remaining words Y. Ma (DCU) Out of Giza 11 / 28

  19. Decoding HMM Viterbi decoding: ˆ a = argmax p ( a | s, t ) a Posterior decoding: Align point a j → i iff. p ( a j → i | s, t ) ≥ δ Y. Ma (DCU) Out of Giza 12 / 28

  20. Decoding HMM Viterbi decoding: ˆ a = argmax p ( a | s, t ) a Posterior decoding: Align point a j → i iff. p ( a j → i | s, t ) ≥ δ IBM model 3 and 4 No efficient algorithm available Y. Ma (DCU) Out of Giza 12 / 28

  21. Advantages of HMM models Efficient parameter estimation algorithm: forward-backward algorithm (Baum-Welch algorithm) Y. Ma (DCU) Out of Giza 13 / 28

  22. Advantages of HMM models Efficient parameter estimation algorithm: forward-backward algorithm (Baum-Welch algorithm) Figure: Eric B. Baum (son of Leonard E. Baum, who was the inventor of the algorithm) and Lloyd R. Welch Y. Ma (DCU) Out of Giza 13 / 28

  23. Advantages of HMM models Efficient parameter estimation algorithm: forward-backward algorithm (Baum-Welch algorithm) Figure: Eric B. Baum (son of Leonard E. Baum, who was the inventor of the algorithm) and Lloyd R. Welch The resulting posterior probabilities are useful Y. Ma (DCU) Out of Giza 13 / 28

  24. Disadvantages of standard HMM models Objective is maximising the likelihood Y. Ma (DCU) Out of Giza 14 / 28

  25. Disadvantages of standard HMM models Objective is maximising the likelihood There is no garantee that the optimised parameters correspond to more accurate alignments Y. Ma (DCU) Out of Giza 14 / 28

  26. Disadvantages of standard HMM models Objective is maximising the likelihood There is no garantee that the optimised parameters correspond to more accurate alignments To complicate things (sometimes!) does help, e.g. IBM model 4 Y. Ma (DCU) Out of Giza 14 / 28

  27. Outline 1 Contexts 2 HMM and IBM Model 4 3 Improved HMM Alignment Models 4 Simultaneous Word Alignment and Phrase Extraction Y. Ma (DCU) Out of Giza 15 / 28

  28. Improved HMM models Two more sophisticated HMM models Segmental HMM model, word-to-phrase alignment model Constrained HMM model, agreement-guided alignment model Y. Ma (DCU) Out of Giza 16 / 28

  29. HMM Word-to-Phrase Alignment [Deng and Byrne, 2008] Introducing a segmentation model: segmental HMM Y. Ma (DCU) Out of Giza 17 / 28

  30. P ( v K 1 , K, a K 1 , h K 1 , φ K P ( t, a | s ) = 1 | s ) (8) Y. Ma (DCU) Out of Giza 18 / 28

  31. P ( v K 1 , K, a K 1 , h K 1 , φ K P ( t, a | s ) = 1 | s ) (8) = P ( K | J, s ) (9) � �� � segmentation Y. Ma (DCU) Out of Giza 18 / 28

  32. P ( v K 1 , K, a K 1 , h K 1 , φ K P ( t, a | s ) = 1 | s ) (8) = P ( K | J, s ) (9) � �� � segmentation × P ( a K 1 , φ K 1 , h K 1 | K, J, s ) (10) � �� � alignment-fertility Y. Ma (DCU) Out of Giza 18 / 28

  33. P ( v K 1 , K, a K 1 , h K 1 , φ K P ( t, a | s ) = 1 | s ) (8) = P ( K | J, s ) (9) � �� � segmentation × P ( a K 1 , φ K 1 , h K 1 | K, J, s ) (10) � �� � alignment-fertility × P ( v K 1 | a K 1 , φ K 1 , h K 1 , K, J, s ) (11) � �� � translation Y. Ma (DCU) Out of Giza 18 / 28

  34. HMM Word-to-Phrase Alignment K � P ( a K 1 , φ K 1 , h K 1 | K, J, s ) = P ( a k , h k , φ k | a k − 1 , φ k − 1 , h k − 1 , K, J, s ) (12) k =1 Y. Ma (DCU) Out of Giza 19 / 28

  35. HMM Word-to-Phrase Alignment K � P ( a K 1 , φ K 1 , h K 1 | K, J, s ) = P ( a k , h k , φ k | a k − 1 , φ k − 1 , h k − 1 , K, J, s ) (12) k =1 K � = p ( a k , | a k − 1 , h k ; I ) · d ( h k ) · n ( φ k ; s a k ) (13) � �� � � �� � � �� � k =1 alignment null alignment fertility Y. Ma (DCU) Out of Giza 19 / 28

  36. Performance of HMM Word-to-Phrase Alignment MTTK implementation Y. Ma (DCU) Out of Giza 20 / 28

  37. Performance of HMM Word-to-Phrase Alignment MTTK implementation Used by Cambridge University Engineering Department Arabic–English NIST 2008 (6th out of 16, third best university participant, behind LIUM and ISI) Consistent performance for Chinese–English for differently sized collections of corpus Parallelised to handle large amount of data (e.g. 10M sentence pairs) Y. Ma (DCU) Out of Giza 20 / 28

  38. Agreement Constrained HMM Alignment [Ganchev et al., 2008] Objective argmin { KL ( q ( a ) || p θ ( a | s, t )) } s.t. E q [ f ( s, t, a )] ≤ b (14) q ( a ) ∈ ( Q ) Y. Ma (DCU) Out of Giza 21 / 28

  39. Agreement Constrained HMM Alignment [Ganchev et al., 2008] Objective argmin { KL ( q ( a ) || p θ ( a | s, t )) } s.t. E q [ f ( s, t, a )] ≤ b (14) q ( a ) ∈ ( Q ) Figure: − → p θ ( a | s, t ), ← − p θ ( a | s, t ) and − → q ( a ), ← − q ( a ) Y. Ma (DCU) Out of Giza 21 / 28

  40. Agreement Constrained HMM Alignment [Ganchev et al., 2008] Constrained E(M) Y. Ma (DCU) Out of Giza 22 / 28

  41. Performance of Agreement Constrained HMM PostCAT implementation Evaluation Six language pairs, from 100,000 to 1M sentence pairs Outperform IBM Model 4 (16 out 18 times) However... getting slightly worse when the training data is over 1M Y. Ma (DCU) Out of Giza 23 / 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend