A Class-Based Agreement Model for Generating Accurately Inflected - - PowerPoint PPT Presentation

a class based agreement model for generating accurately
SMART_READER_LITE
LIVE PREVIEW

A Class-Based Agreement Model for Generating Accurately Inflected - - PowerPoint PPT Presentation

A Class-Based Agreement Model for Generating Accurately Inflected Translations ACL 2012 // Jeju Spence Green Stanford University John DeNero Google Local Agreement Error Input : The car goes quickly. Reference :


slide-1
SLIDE 1

A Class-Based Agreement Model for Generating Accurately Inflected Translations ACL 2012 // Jeju

Spence Green Stanford University John DeNero Google

slide-2
SLIDE 2

Local Agreement Error

Input: The car goes quickly. Reference: (1) a.

  • the-car+F
  • go+F
  • with-speed

2

slide-3
SLIDE 3

Local Agreement Error

Input: The car goes quickly. Reference: (1) a.

  • the-car+F
  • go+F
  • with-speed

Google Translate: (2) a.

  • the-car+F
  • go+M
  • with-speed

2

slide-4
SLIDE 4

Long-distance Agreement Error

Input: The one who is speaking is my wife. Reference: (3) a. celle

  • ne+F

qui who parle speak , , c’est is ma my femme wife+F 3

slide-5
SLIDE 5

Long-distance Agreement Error

Input: The one who is speaking is my wife. Reference: (3) a. celle

  • ne+F

qui who parle speak , , c’est is ma my femme wife+F Google Translate: (4) a. celui

  • ne+M

qui who parle speak est is ma my femme spouse+F 3

slide-6
SLIDE 6

Agreement Errors: Really Annoying

Ref John runs to his house. MT John run to her house. 4

slide-7
SLIDE 7

Agreement Errors in Phrase-Based MT

Agreement relations cross phrase boundaries

  • 5
slide-8
SLIDE 8

Agreement Errors in Phrase-Based MT

Agreement relations cross phrase boundaries

  • Language model should help?

◮ Sparser n-gram counts ◮ LM may back off more often

5

slide-9
SLIDE 9

Possible Solutions

Morphological generation e.g. [Minkov et al. 2007]

◮ Useful when correct translations aren’t in phrase table

6

slide-10
SLIDE 10

Possible Solutions

Morphological generation e.g. [Minkov et al. 2007]

◮ Useful when correct translations aren’t in phrase table

Our work: model agreement with a new feature

◮ Large phrase tables already contain many word forms

6

slide-11
SLIDE 11

Key Idea: Morphological Word Classes

  • ‘car’

    CAT noun AGR

  • GEN

fem NUM sg

  

  • ‘to go’

      CAT verb AGR    GEN fem NUM sg PER 3          7

slide-12
SLIDE 12

Key Idea: Morphological Word Classes

  • ‘car’

noun+fem+sg

  • ‘to go’

verb+fem+sg+3 8

slide-13
SLIDE 13

Key Idea: Morphological Word Classes

  • ‘car’

noun+fem+sg

  • ‘to go’

verb+fem+sg+3 Linearized feature structure is equally expressive, assuming a fixed order 8

slide-14
SLIDE 14

A Class-based Agreement Model

N+F

  • V+F
  • ADV
  • 9
slide-15
SLIDE 15

A Class-based Agreement Model

N+F

  • V+F
  • ADV
  • 10
slide-16
SLIDE 16
  • 1. Model Formulation (for Arabic)
  • 2. MT Decoder Integration
  • 3. English-Arabic Evaluation
slide-17
SLIDE 17
  • 1. Model Formulation (for Arabic)
  • 2. MT Decoder Integration
  • 3. English-Arabic Evaluation
slide-18
SLIDE 18

Agreement Model Formulation

Implemented as a decoder feature 13

slide-19
SLIDE 19

Agreement Model Formulation

Implemented as a decoder feature when each hypothesis h ∈ H is extended: ˆ s = segment(h) τ = tag(ˆ s) q(h) = score(τ) return q(h) 13

slide-20
SLIDE 20

Step 1: Segmentation

  • Pron+Fem+Sg

Verb+Masc+3+Pl Prt Conj and will they write it

14

slide-21
SLIDE 21

Step 1: Segmentation

Character-level CRF: p(ˆ s|words) Features: Centered 5-character window Label set:

◮ I inside segment ◮ O outside segment (whitespace) ◮ B beginning of segment ◮ F do not segment (punctuation, digits, ASCII)

15

slide-22
SLIDE 22

Step 2: Tagging

N+F

  • V+F
  • ADV
  • 16
slide-23
SLIDE 23

Step 2: Tagging

Token-level CRF: p(τ|ˆ s) Features: Current and previous words, affixes, etc. Label set: morphological classes (89 for Arabic)

◮ Gender, number, person, definiteness

17

slide-24
SLIDE 24

Step 2: Tagging

Token-level CRF: p(τ|ˆ s) Features: Current and previous words, affixes, etc. Label set: morphological classes (89 for Arabic)

◮ Gender, number, person, definiteness

What about incomplete hypotheses? 17

slide-25
SLIDE 25

Step 3: Scoring

Problem: Discriminative model score p(τ|ˆ s) not comparable across hypotheses

◮ MST parser score: works?

[Galley and Manning 2009]

◮ CRF score: fail

[this paper]

18

slide-26
SLIDE 26

Step 3: Scoring

Problem: Discriminative model score p(τ|ˆ s) not comparable across hypotheses

◮ MST parser score: works?

[Galley and Manning 2009]

◮ CRF score: fail

[this paper]

Solution: Generative scoring of class sequences 18

slide-27
SLIDE 27

Step 3: Scoring

Simple bigram LM trained on gold class sequences τ ∗ = arg max

τ

p(τ|ˆ s) q(h) = p(τ ∗) =

  • i

p(τ ∗

i |τ ∗ i−1)

19

slide-28
SLIDE 28

Step 3: Scoring

Simple bigram LM trained on gold class sequences τ ∗ = arg max

τ

p(τ|ˆ s) q(h) = p(τ ∗) =

  • i

p(τ ∗

i |τ ∗ i−1)

Order of scoring model dependent on MT decoder design 19

slide-29
SLIDE 29
  • 1. Model Formulation (for Arabic)
  • 2. MT Decoder Integration
  • 3. English-Arabic Evaluation
slide-30
SLIDE 30

MT Decoder Integration

Tagger CRF

  • 1. Remove next-word features
  • 2. Only tag boundary for goal hypotheses

21

slide-31
SLIDE 31

MT Decoder Integration

Tagger CRF

  • 1. Remove next-word features
  • 2. Only tag boundary for goal hypotheses

Hypothesis state: last segment + class LM history:

  • Agreement history:
  • / verb+fem+sg+3

21

slide-32
SLIDE 32
  • 1. Model Formulation (for Arabic)
  • 2. MT Decoder Integration
  • 3. English-Arabic Evaluation
slide-33
SLIDE 33

Component Models (Arabic Only) Full (%) Incremental (%) Segmenter 98.6 – Tagger 96.3 96.2

Data: Penn Arabic Treebank [Maamouri et al. 2004] Setup: Dev set, standard split [Rambow et al. 2005] 23

slide-34
SLIDE 34

Translation Quality

Phrase-based decoder [Och and Ney 2004]

◮ Phrase frequency, lexicalized re-ordering model, etc.

Bitext: 502M English-Arabic tokens LM: 4-gram from 600M Arabic tokens 24

slide-35
SLIDE 35

Translation Quality: NIST Newswire

MT04 (tune) MT02 MT03 MT05 15 17 19 21 23 25 27 18.1 23.9 18.9 22.6 18.9 24.8 20.3 23.5 BLEU-4 (uncased)

Average gain: +1.04 BLEU (significant at p ≤ 0.01) 25

slide-36
SLIDE 36

Translation Quality: NIST Mixed Genre

MT06 MT08 13 14 15 16 14.7 14.3 15.0 14.5 BLEU-4 (uncased) Average gain: +0.29 BLEU (significant at p ≤ 0.02) 26

slide-37
SLIDE 37

Human Evaluation

MT05 output: 74.3% of hypotheses differed from baseline Sampled 100 sentence pairs Manually counted agreement errors 27

slide-38
SLIDE 38

Human Evaluation

MT05 output: 74.3% of hypotheses differed from baseline Sampled 100 sentence pairs Manually counted agreement errors Result: 15.4% error reduction, p ≤ 0.01 (78 vs. 66) 27

slide-39
SLIDE 39

Analysis: Phrase Table Coverage

Hypothesis: Inflected forms in phrase table, but unused 28

slide-40
SLIDE 40

Analysis: Phrase Table Coverage

Hypothesis: Inflected forms in phrase table, but unused Analysis: Measure MT05 reference unigram coverage 28

slide-41
SLIDE 41

Analysis: Phrase Table Coverage

Hypothesis: Inflected forms in phrase table, but unused Analysis: Measure MT05 reference unigram coverage

Matching phrase pairs Baseline unigram coverage

0% 20% 40% 60% 80%

67.8% 44.6% 28

slide-42
SLIDE 42

Conclusion: Implementation is Easy

You need:

  • 1. CRF package
  • 2. Know-how for implementing decoder features
  • 3. Morphologically annotated corpus

29

slide-43
SLIDE 43

Conclusion: Contributions

Translation quality improvement in a large-scale system 30

slide-44
SLIDE 44

Conclusion: Contributions

Translation quality improvement in a large-scale system Classes and segmentation predicted during decoding

◮ Modeling flexibility

30

slide-45
SLIDE 45

Conclusion: Contributions

Translation quality improvement in a large-scale system Classes and segmentation predicted during decoding

◮ Modeling flexibility

Foundation for structured language models

◮ Future work: long-distance relations

30

slide-46
SLIDE 46

Segmenter: nlp.stanford.edu/software/

thanks.

slide-47
SLIDE 47

References

Galley, M. and C. D. Manning (2009). “Quadratic-time dependency parsing for machine translation”. In: ACL-IJCNLP. Maamouri, M. et al. (2004). “The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus”. In: NEMLAR. Minkov, E., K. Toutanova, and H. Suzuki (2007). “Generating Complex Morphology for Machine Translation”. In: ACL. Och, F. J. and H. Ney (2004). “The alignment template approach to statistical machine translation”. In: Computational Linguistics 30.4, pp. 417–449. Rambow, O. et al. (2005). Parsing Arabic Dialects. Tech. rep. Johns Hopkins University.

32