a class based agreement model for generating accurately
play

A Class-Based Agreement Model for Generating Accurately Inflected - PowerPoint PPT Presentation

A Class-Based Agreement Model for Generating Accurately Inflected Translations ACL 2012 // Jeju Spence Green Stanford University John DeNero Google Local Agreement Error Input : The car goes quickly. Reference :


  1. A Class-Based Agreement Model for Generating Accurately Inflected Translations ACL 2012 // Jeju Spence Green Stanford University John DeNero Google

  2. Local Agreement Error Input : The car goes quickly. Reference : � � � � � �� ���� ���� � � ������� (1) a. the-car +F go +F with-speed 2

  3. Local Agreement Error Input : The car goes quickly. Reference : � � � � � �� ���� ���� � � ������� (1) a. the-car +F go +F with-speed Google Translate : � � � � � ���� ���� � �� ������� (2) a. � the-car +F go +M with-speed 2

  4. Long-distance Agreement Error Input : The one who is speaking is my wife. Reference : (3) a. celle qui parle , c’est ma femme one +F who speak , is my wife +F 3

  5. Long-distance Agreement Error Input : The one who is speaking is my wife. Reference : (3) a. celle qui parle , c’est ma femme one +F who speak , is my wife +F Google Translate : (4) a. celui qui parle est ma femme one +M who speak is my spouse +F 3

  6. Agreement Errors: Really Annoying Ref John runs to his house. MT John run to her house. 4

  7. Agreement Errors in Phrase-Based MT Agreement relations cross phrase boundaries � � � � � � ���� ���� � �� ������� 5

  8. Agreement Errors in Phrase-Based MT Agreement relations cross phrase boundaries � � � � � � ���� ���� � �� ������� Language model should help? ◮ Sparser n -gram counts ◮ LM may back off more often 5

  9. Possible Solutions Morphological generation e.g. [ Minkov et al. 2007 ] ◮ Useful when correct translations aren’t in phrase table 6

  10. Possible Solutions Morphological generation e.g. [ Minkov et al. 2007 ] ◮ Useful when correct translations aren’t in phrase table Our work: model agreement with a new feature ◮ Large phrase tables already contain many word forms 6

  11. Key Idea: Morphological Word Classes   CAT noun � � � ���� ���� ‘car’   GEN fem   AGR   NUM sg   CAT verb   � � �   GEN fem ��   � � ‘to go’   AGR NUM sg         PER 3 7

  12. Key Idea: Morphological Word Classes � ���� ���� ‘car’ noun+fem+sg � � � �� � � ‘to go’ verb+fem+sg+3 8

  13. Key Idea: Morphological Word Classes � ���� ���� ‘car’ noun+fem+sg � � � �� � � ‘to go’ verb+fem+sg+3 Linearized feature structure is equally expressive, assuming a fixed order 8

  14. A Class-based Agreement Model N +F V +F ADV ������� ���� ���� 9

  15. A Class-based Agreement Model N +F V +F ADV ������� ���� ���� 10

  16. 1. Model Formulation (for Arabic) 2. MT Decoder Integration 3. English-Arabic Evaluation

  17. 1. Model Formulation (for Arabic) 2. MT Decoder Integration 3. English-Arabic Evaluation

  18. Agreement Model Formulation Implemented as a decoder feature 13

  19. Agreement Model Formulation Implemented as a decoder feature when each hypothesis h ∈ H is extended: ˆ s = segment ( h ) τ = tag (ˆ s ) q ( h ) = score ( τ ) return q ( h ) 13

  20. Step 1: Segmentation Prt Pron+Fem+Sg Verb+Masc+3+Pl Conj � � � � � � � � � � and it they write will � � � � � � � � � � 14

  21. Step 1: Segmentation p (ˆ Character-level CRF: s | words ) Features : Centered 5-character window Label set : ◮ I inside segment ◮ O outside segment (whitespace) ◮ B beginning of segment ◮ F do not segment (punctuation, digits, ASCII) 15

  22. Step 2: Tagging N +F V +F ADV ������� ���� ���� 16

  23. Step 2: Tagging p ( τ | ˆ Token-level CRF: s ) Features : Current and previous words, affixes, etc. Label set : morphological classes (89 for Arabic) ◮ Gender, number, person, definiteness 17

  24. Step 2: Tagging p ( τ | ˆ Token-level CRF: s ) Features : Current and previous words, affixes, etc. Label set : morphological classes (89 for Arabic) ◮ Gender, number, person, definiteness What about incomplete hypotheses? 17

  25. Step 3: Scoring Problem : Discriminative model score p ( τ | ˆ s ) not comparable across hypotheses ◮ MST parser score: works? [ Galley and Manning 2009 ] ◮ CRF score: fail [this paper] 18

  26. Step 3: Scoring Problem : Discriminative model score p ( τ | ˆ s ) not comparable across hypotheses ◮ MST parser score: works? [ Galley and Manning 2009 ] ◮ CRF score: fail [this paper] Solution : Generative scoring of class sequences 18

  27. Step 3: Scoring Simple bigram LM trained on gold class sequences τ ∗ = arg max p ( τ | ˆ s ) τ � q ( h ) = p ( τ ∗ ) = p ( τ ∗ i | τ ∗ i − 1 ) i 19

  28. Step 3: Scoring Simple bigram LM trained on gold class sequences τ ∗ = arg max p ( τ | ˆ s ) τ � q ( h ) = p ( τ ∗ ) = p ( τ ∗ i | τ ∗ i − 1 ) i Order of scoring model dependent on MT decoder design 19

  29. 1. Model Formulation (for Arabic) 2. MT Decoder Integration 3. English-Arabic Evaluation

  30. MT Decoder Integration Tagger CRF 1. Remove next-word features 2. Only tag boundary for goal hypotheses 21

  31. MT Decoder Integration Tagger CRF 1. Remove next-word features 2. Only tag boundary for goal hypotheses Hypothesis state: last segment + class � � � � � �� LM history: � ���� ���� � � � �� Agreement history: � � / verb+fem+sg+3 21

  32. 1. Model Formulation (for Arabic) 2. MT Decoder Integration 3. English-Arabic Evaluation

  33. Component Models (Arabic Only) Full (%) Incremental (%) Segmenter 98.6 – Tagger 96.3 96.2 Data : Penn Arabic Treebank [ Maamouri et al. 2004 ] Setup : Dev set, standard split [ Rambow et al. 2005 ] 23

  34. Translation Quality Phrase-based decoder [ Och and Ney 2004 ] ◮ Phrase frequency, lexicalized re-ordering model, etc. Bitext : 502M English-Arabic tokens LM : 4-gram from 600M Arabic tokens 24

  35. Translation Quality: NIST Newswire 27 24.8 25 23.9 23.5 BLEU-4 (uncased) 22.6 23 20.3 21 18.9 18.9 19 18.1 17 15 MT04 ( tune) MT02 MT03 MT05 Average gain : + 1.04 BLEU (significant at p ≤ 0.01) 25

  36. Translation Quality: NIST Mixed Genre 16 BLEU-4 ( uncased) 15.0 15 14.7 14.5 14.3 14 13 MT06 MT08 Average gain : + 0.29 BLEU (significant at p ≤ 0.02) 26

  37. Human Evaluation MT05 output: 74.3% of hypotheses differed from baseline Sampled 100 sentence pairs Manually counted agreement errors 27

  38. Human Evaluation MT05 output: 74.3% of hypotheses differed from baseline Sampled 100 sentence pairs Manually counted agreement errors Result : 15.4% error reduction, p ≤ 0.01 (78 vs. 66) 27

  39. Analysis: Phrase Table Coverage Hypo thesis: Inflected forms in phrase table, but unused 28

  40. Analysis: Phrase Table Coverage Hypo thesis: Inflected forms in phrase table, but unused Analysis : Measure MT05 reference unigram coverage 28

  41. Analysis: Phrase Table Coverage Hypo thesis: Inflected forms in phrase table, but unused Analysis : Measure MT05 reference unigram coverage 44.6% Baseline unigram coverage Matching phrase pairs 67.8% 0% 20% 40% 60% 80% 28

  42. Conclusion: Implementation is Easy Y ou need: 1. CRF package 2. Know-how for implementing decoder features 3. Morphologically annotated corpus 29

  43. Conclusion: Contributions T ranslation quality improvement in a large-scale system 30

  44. Conclusion: Contributions T ranslation quality improvement in a large-scale system Classes and segmentation predicted during decoding ◮ Modeling flexibility 30

  45. Conclusion: Contributions T ranslation quality improvement in a large-scale system Classes and segmentation predicted during decoding ◮ Modeling flexibility Foundation for structured language models ◮ Future work: long-distance relations 30

  46. Segmenter: nlp.stanford.edu/software/ thanks. �� � � � ��� �� �� � ����

  47. References Galley, M. and C. D. Manning (2009). “Quadratic-time dependency parsing for machine translation”. In: ACL-IJCNLP . Maamouri, M. et al. (2004). “The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus”. In: NEMLAR . Minkov, E., K. Toutanova, and H. Suzuki (2007). “Generating Complex Morphology for Machine Translation”. In: ACL . Och, F. J. and H. Ney (2004). “The alignment template approach to statistical machine translation”. In: Computational Linguistics 30.4, pp. 417–449. Rambow, O. et al. (2005). Parsing Arabic Dialects . Tech. rep. Johns Hopkins University. 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend