Improved Models of Distortion Cost for Statistical Machine - - PowerPoint PPT Presentation

improved models of distortion cost for statistical
SMART_READER_LITE
LIVE PREVIEW

Improved Models of Distortion Cost for Statistical Machine - - PowerPoint PPT Presentation

Improved Models of Distortion Cost for Statistical Machine Translation Spence Green, Michel Galley, and Christopher D. Manning Stanford University June 4, 2010 Motivation A New Cost Model Phrase-based MT Evaluation Limit v. Cost Conclusion


slide-1
SLIDE 1

Improved Models of Distortion Cost for Statistical Machine Translation

Spence Green, Michel Galley, and Christopher D. Manning Stanford University June 4, 2010

slide-2
SLIDE 2

Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost

Motivation

Why phrase-based MT?

◮ Fast, simple, and scalable ◮ Good performance for many language pairs

(Zollmann et al., 2008; Lopez, 2008; etc.)

Reordering in (baseline) phrase-based decoders controlled by:

◮ A distortion cost model ◮ A distortion limit

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 2 / 37

slide-3
SLIDE 3

Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost

Motivation

Why phrase-based MT?

◮ Fast, simple, and scalable ◮ Good performance for many language pairs

(Zollmann et al., 2008; Lopez, 2008; etc.)

Reordering in (baseline) phrase-based decoders controlled by:

◮ A distortion cost model ◮ A distortion limit

Cost model is poor, so a low distortion limit is typically used

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 2 / 37

slide-4
SLIDE 4

Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost

Motivating Example

  • !"

#

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 3 / 37

slide-5
SLIDE 5

Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost

Motivating Example

  • !!!

"#$

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 4 / 37

slide-6
SLIDE 6

Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost

Motivating Example

  • !"#

$

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 5 / 37

slide-7
SLIDE 7

Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost

Motivating Example

  • !!!

"#$

%

&

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 6 / 37

slide-8
SLIDE 8

Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost

Motivating Example

  • !"#

$

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 7 / 37

slide-9
SLIDE 9

Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost

Motivating Example

  • !"#

$

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 8 / 37

slide-10
SLIDE 10

Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost

Motivating Example

  • !

" #$$

%

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 9 / 37

slide-11
SLIDE 11

Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost

Distortion Limit v. Distortion Cost

Cost is a soft constraint

◮ Does not prune the search space ◮ Feature in the log-linear decoder framework

Limit is a hard constraint

◮ Prunes translations from the search space

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 10 / 37

slide-12
SLIDE 12

Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost

Distortion Limit v. Distortion Cost

Cost is a soft constraint

◮ Does not prune the search space ◮ Feature in the log-linear decoder framework

Limit is a hard constraint

◮ Prunes translations from the search space

For Moses, low(er) distortion limit improves translation quality!

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 10 / 37

slide-13
SLIDE 13

Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost

Translation Quality Decreases at High Distortion Limits

42 43.0 44.0 2 4 6 8 10 Arabic-English Moses BLEU-4 Performance 29 30.0 31.0 32.0 2 4 6 8 10 12 14

Distortion Limit

Chinese-English

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 11 / 37

slide-14
SLIDE 14

Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost

Hard Constraints Reduce Reference Reachability

10 15 20 25 30 35 20 220 420 620 820

dlimit = 6 dlimit = 9 dlimit = 12 dlimit = 15

Translation Option Limit

Reference Reachability (%)

(Auli, Lopez, Hoang, and Koehn, 2009) Green, Galley, and Manning Improved Models of Distortion Cost for SMT 12 / 37

slide-15
SLIDE 15

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

A New Distortion Cost Model

Guide search without hard constraints

◮ Maintain baseline performance at high distortion limits ◮ Solution: Improve heuristic search with future cost estimation

(Moore and Quirk, 2007)

Encourage linguistically-approriate reorderings

◮ Solution: Transition-based discriminative distortion model

Worst-case O(n) cost computation

◮ Maintain linear running time of decoding!

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 13 / 37

slide-16
SLIDE 16

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Search Errors at High Distortion Limits

  • !"#

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 14 / 37

slide-17
SLIDE 17

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Search Errors at High Distortion Limits

  • !!!

"#$

  • Green, Galley, and Manning

Improved Models of Distortion Cost for SMT 15 / 37

slide-18
SLIDE 18

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Search Errors at High Distortion Limits

  • !"

#

  • Green, Galley, and Manning

Improved Models of Distortion Cost for SMT 16 / 37

slide-19
SLIDE 19

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Search Errors at High Distortion Limits

  • !"#

$

  • Green, Galley, and Manning

Improved Models of Distortion Cost for SMT 17 / 37

slide-20
SLIDE 20

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Search Errors at High Distortion Limits

  • !"#

$

  • Green, Galley, and Manning

Improved Models of Distortion Cost for SMT 18 / 37

slide-21
SLIDE 21

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Search Errors at High Distortion Limits

  • !"

#! $!"

  • Green, Galley, and Manning

Improved Models of Distortion Cost for SMT 19 / 37

slide-22
SLIDE 22

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Search Errors at High Distortion Limits

  • !"#

$

  • Green, Galley, and Manning

Improved Models of Distortion Cost for SMT 20 / 37

slide-23
SLIDE 23

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Search Errors at High Distortion Limits

  • !

" ! ! ! #

$

%

  • Green, Galley, and Manning

Improved Models of Distortion Cost for SMT 21 / 37

slide-24
SLIDE 24

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

An Admissible Future Cost Heuristic

sj ← First uncovered source position sj′ ← First source position of phrase p Cj ← Coverage set to the right of sj D(sj′, sj) ← Linear distortion from sj′ to sj When j′ > j, the estimate is F = |Cj| + D(sj′, sj) Update the estimate at each translation step n ∆F = Fn − Fn−1 n > 0

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 22 / 37

slide-25
SLIDE 25

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Linear Distortion with Future Cost

  • !"#

$

  • Green, Galley, and Manning

Improved Models of Distortion Cost for SMT 23 / 37

slide-26
SLIDE 26

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Linear Distortion with Future Cost

  • !"#$
  • %

& !"#$$ ''! '#

  • Green, Galley, and Manning

Improved Models of Distortion Cost for SMT 24 / 37

slide-27
SLIDE 27

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Linear Distortion with Future Cost

  • !

"

#$% &

'(((

  • Green, Galley, and Manning

Improved Models of Distortion Cost for SMT 25 / 37

slide-28
SLIDE 28

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Linear Distortion with Future Cost

  • !

" #

$ %%%

  • Green, Galley, and Manning

Improved Models of Distortion Cost for SMT 26 / 37

slide-29
SLIDE 29

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Transition-based Discriminative Distortion Cost

Problem: Cost model still penalizes all reorderings

◮ Consider a verb-final language like Japanese ◮ Skipping over the entire verb complement is good ◮ Model should prefer particular reorderings

A transition-based discriminative distortion model

◮ Idea: compute cost of word-to-word transitions ◮ Source side features ◮ Discretized transition classes

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 27 / 37

slide-30
SLIDE 30

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Procedure

  • 1. Classify discretized transitions with a log-linear model

  • Dj , j′ | ¯

s, j, j′ ∝ exp ¯ λ · ¯ h

  • ¯

s, j, j′, Dj , j′

  • 2. Train with sorted word-to-word alignments

◮ e.g. Arabic =

⇒ Arabic′ (English word order)

  • 3. Query model for transitions at each translation step

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 28 / 37

slide-31
SLIDE 31

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Features

This evaluation

◮ Words and POS tags ◮ Relative source sentence position (discretized) ◮ Source sentence length (discretized)

Future work

◮ POS tag chains (bigram, trigram, etc.) ◮ Agreement morphology ◮ Subject has been translated? (binary, global)

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 29 / 37

slide-32
SLIDE 32

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Further Motivation

Incremental processing like shift-reduce parsing

◮ Process source items in decoding order ◮ Classes are discrete jumps instead of “operations” ◮ Beam search via the decoder

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 30 / 37

slide-33
SLIDE 33

Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation

Implementation: Constant-time During Decoding

Nine discrete distortion classes

◮ Same number of training examples per class

Separation into inbound/outbound models

(Al-Onaizan and Papineni, 2006)

◮ Simplifies caching during decoding ◮ Future work: Combine into a single model

Model has four decoder features

◮ Inbound and outbound scores ◮ Alignment penalty ◮ Future cost estimate

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 31 / 37

slide-34
SLIDE 34

Motivation A New Cost Model Evaluation Conclusion High Distortion Limit d5 v. d15

Evaluation

MT system is Phrasal (Cer et al., 2010)

◮ Baseline: Moses feature set ◮ Lexicalized reordering model of Galley and Manning (2008)

NIST MT09 Ar-En constrained track training data

◮ Removed UN and comparable data

◮ Same baseline, faster experiments

◮ 6.20M English and 5.73M Arabic tokens

Evaluated BLEU-4 on MT03/05/06/08 at dlimit = 15

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 32 / 37

slide-35
SLIDE 35

Motivation A New Cost Model Evaluation Conclusion High Distortion Limit d5 v. d15

High Distortion Limit (15)

  • Green, Galley, and Manning

Improved Models of Distortion Cost for SMT 33 / 37

slide-36
SLIDE 36

Motivation A New Cost Model Evaluation Conclusion High Distortion Limit d5 v. d15

Improvement Over the Low Distortion Limit Baseline

  • Green, Galley, and Manning

Improved Models of Distortion Cost for SMT 34 / 37

slide-37
SLIDE 37

Motivation A New Cost Model Evaluation Conclusion

Conclusion

Contributions of this work

◮ Fixed search errors caused by linear distortion ◮ Added a distortion model with linguistic features ◮ Modest improvement over Moses at a high distortion limit

Software: Phrasal

http://nlp.stanford.edu/phrasal/

Arabic NLP tools

http://nlp.stanford.edu/projects/arabic.shtml

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 35 / 37

slide-38
SLIDE 38

Motivation A New Cost Model Evaluation Conclusion

Thank You!

Thanks to Daniel Cer and Claude Reichard.

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 36 / 37

slide-39
SLIDE 39

Motivation A New Cost Model Evaluation Conclusion

Distortion Cost Curve for the adjective American

1 2 3 4 5 6 7 8 9 −6 −5 −4 −3 −2 −1

First Quintile

Inbound Distortion Model

1 2 3 4 5 6 7 8 9 −6 −5 −4 −3 −2 −1

Middle Quintile

1 2 3 4 5 6 7 8 9 −7 −6 −5 −4 −3 −2 −1

Last Quintile From right

Distortion Class

From left

Green, Galley, and Manning Improved Models of Distortion Cost for SMT 37 / 37