Improved Models of Distortion Cost for Statistical Machine - - PowerPoint PPT Presentation
Improved Models of Distortion Cost for Statistical Machine - - PowerPoint PPT Presentation
Improved Models of Distortion Cost for Statistical Machine Translation Spence Green, Michel Galley, and Christopher D. Manning Stanford University June 4, 2010 Motivation A New Cost Model Phrase-based MT Evaluation Limit v. Cost Conclusion
Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost
Motivation
Why phrase-based MT?
◮ Fast, simple, and scalable ◮ Good performance for many language pairs
(Zollmann et al., 2008; Lopez, 2008; etc.)
Reordering in (baseline) phrase-based decoders controlled by:
◮ A distortion cost model ◮ A distortion limit
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 2 / 37
Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost
Motivation
Why phrase-based MT?
◮ Fast, simple, and scalable ◮ Good performance for many language pairs
(Zollmann et al., 2008; Lopez, 2008; etc.)
Reordering in (baseline) phrase-based decoders controlled by:
◮ A distortion cost model ◮ A distortion limit
Cost model is poor, so a low distortion limit is typically used
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 2 / 37
Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost
Motivating Example
- !"
#
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 3 / 37
Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost
Motivating Example
- !!!
"#$
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 4 / 37
Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost
Motivating Example
- !"#
$
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 5 / 37
Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost
Motivating Example
- !!!
"#$
%
&
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 6 / 37
Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost
Motivating Example
- !"#
$
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 7 / 37
Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost
Motivating Example
- !"#
$
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 8 / 37
Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost
Motivating Example
- !
" #$$
%
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 9 / 37
Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost
Distortion Limit v. Distortion Cost
Cost is a soft constraint
◮ Does not prune the search space ◮ Feature in the log-linear decoder framework
Limit is a hard constraint
◮ Prunes translations from the search space
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 10 / 37
Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost
Distortion Limit v. Distortion Cost
Cost is a soft constraint
◮ Does not prune the search space ◮ Feature in the log-linear decoder framework
Limit is a hard constraint
◮ Prunes translations from the search space
For Moses, low(er) distortion limit improves translation quality!
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 10 / 37
Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost
Translation Quality Decreases at High Distortion Limits
42 43.0 44.0 2 4 6 8 10 Arabic-English Moses BLEU-4 Performance 29 30.0 31.0 32.0 2 4 6 8 10 12 14
Distortion Limit
Chinese-English
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 11 / 37
Motivation A New Cost Model Evaluation Conclusion Phrase-based MT Limit v. Cost
Hard Constraints Reduce Reference Reachability
10 15 20 25 30 35 20 220 420 620 820
dlimit = 6 dlimit = 9 dlimit = 12 dlimit = 15
Translation Option Limit
Reference Reachability (%)
(Auli, Lopez, Hoang, and Koehn, 2009) Green, Galley, and Manning Improved Models of Distortion Cost for SMT 12 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
A New Distortion Cost Model
Guide search without hard constraints
◮ Maintain baseline performance at high distortion limits ◮ Solution: Improve heuristic search with future cost estimation
(Moore and Quirk, 2007)
Encourage linguistically-approriate reorderings
◮ Solution: Transition-based discriminative distortion model
Worst-case O(n) cost computation
◮ Maintain linear running time of decoding!
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 13 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Search Errors at High Distortion Limits
- !"#
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 14 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Search Errors at High Distortion Limits
- !!!
"#$
- Green, Galley, and Manning
Improved Models of Distortion Cost for SMT 15 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Search Errors at High Distortion Limits
- !"
#
- Green, Galley, and Manning
Improved Models of Distortion Cost for SMT 16 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Search Errors at High Distortion Limits
- !"#
$
- Green, Galley, and Manning
Improved Models of Distortion Cost for SMT 17 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Search Errors at High Distortion Limits
- !"#
$
- Green, Galley, and Manning
Improved Models of Distortion Cost for SMT 18 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Search Errors at High Distortion Limits
- !"
#! $!"
- Green, Galley, and Manning
Improved Models of Distortion Cost for SMT 19 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Search Errors at High Distortion Limits
- !"#
$
- Green, Galley, and Manning
Improved Models of Distortion Cost for SMT 20 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Search Errors at High Distortion Limits
- !
" ! ! ! #
$
%
- Green, Galley, and Manning
Improved Models of Distortion Cost for SMT 21 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
An Admissible Future Cost Heuristic
sj ← First uncovered source position sj′ ← First source position of phrase p Cj ← Coverage set to the right of sj D(sj′, sj) ← Linear distortion from sj′ to sj When j′ > j, the estimate is F = |Cj| + D(sj′, sj) Update the estimate at each translation step n ∆F = Fn − Fn−1 n > 0
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 22 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Linear Distortion with Future Cost
- !"#
$
- Green, Galley, and Manning
Improved Models of Distortion Cost for SMT 23 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Linear Distortion with Future Cost
- !"#$
- %
& !"#$$ ''! '#
- Green, Galley, and Manning
Improved Models of Distortion Cost for SMT 24 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Linear Distortion with Future Cost
- !
"
#$% &
'(((
- Green, Galley, and Manning
Improved Models of Distortion Cost for SMT 25 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Linear Distortion with Future Cost
- !
" #
$ %%%
- Green, Galley, and Manning
Improved Models of Distortion Cost for SMT 26 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Transition-based Discriminative Distortion Cost
Problem: Cost model still penalizes all reorderings
◮ Consider a verb-final language like Japanese ◮ Skipping over the entire verb complement is good ◮ Model should prefer particular reorderings
A transition-based discriminative distortion model
◮ Idea: compute cost of word-to-word transitions ◮ Source side features ◮ Discretized transition classes
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 27 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Procedure
- 1. Classify discretized transitions with a log-linear model
pλ
- Dj , j′ | ¯
s, j, j′ ∝ exp ¯ λ · ¯ h
- ¯
s, j, j′, Dj , j′
- 2. Train with sorted word-to-word alignments
◮ e.g. Arabic =
⇒ Arabic′ (English word order)
- 3. Query model for transitions at each translation step
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 28 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Features
This evaluation
◮ Words and POS tags ◮ Relative source sentence position (discretized) ◮ Source sentence length (discretized)
Future work
◮ POS tag chains (bigram, trigram, etc.) ◮ Agreement morphology ◮ Subject has been translated? (binary, global)
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 29 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Further Motivation
Incremental processing like shift-reduce parsing
◮ Process source items in decoding order ◮ Classes are discrete jumps instead of “operations” ◮ Beam search via the decoder
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 30 / 37
Motivation A New Cost Model Evaluation Conclusion Future Cost Estimation Discriminative Distortion Cost Implementation
Implementation: Constant-time During Decoding
Nine discrete distortion classes
◮ Same number of training examples per class
Separation into inbound/outbound models
(Al-Onaizan and Papineni, 2006)
◮ Simplifies caching during decoding ◮ Future work: Combine into a single model
Model has four decoder features
◮ Inbound and outbound scores ◮ Alignment penalty ◮ Future cost estimate
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 31 / 37
Motivation A New Cost Model Evaluation Conclusion High Distortion Limit d5 v. d15
Evaluation
MT system is Phrasal (Cer et al., 2010)
◮ Baseline: Moses feature set ◮ Lexicalized reordering model of Galley and Manning (2008)
NIST MT09 Ar-En constrained track training data
◮ Removed UN and comparable data
◮ Same baseline, faster experiments
◮ 6.20M English and 5.73M Arabic tokens
Evaluated BLEU-4 on MT03/05/06/08 at dlimit = 15
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 32 / 37
Motivation A New Cost Model Evaluation Conclusion High Distortion Limit d5 v. d15
High Distortion Limit (15)
- Green, Galley, and Manning
Improved Models of Distortion Cost for SMT 33 / 37
Motivation A New Cost Model Evaluation Conclusion High Distortion Limit d5 v. d15
Improvement Over the Low Distortion Limit Baseline
- Green, Galley, and Manning
Improved Models of Distortion Cost for SMT 34 / 37
Motivation A New Cost Model Evaluation Conclusion
Conclusion
Contributions of this work
◮ Fixed search errors caused by linear distortion ◮ Added a distortion model with linguistic features ◮ Modest improvement over Moses at a high distortion limit
Software: Phrasal
http://nlp.stanford.edu/phrasal/
Arabic NLP tools
http://nlp.stanford.edu/projects/arabic.shtml
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 35 / 37
Motivation A New Cost Model Evaluation Conclusion
Thank You!
Thanks to Daniel Cer and Claude Reichard.
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 36 / 37
Motivation A New Cost Model Evaluation Conclusion
Distortion Cost Curve for the adjective American
1 2 3 4 5 6 7 8 9 −6 −5 −4 −3 −2 −1
First Quintile
Inbound Distortion Model
1 2 3 4 5 6 7 8 9 −6 −5 −4 −3 −2 −1
Middle Quintile
1 2 3 4 5 6 7 8 9 −7 −6 −5 −4 −3 −2 −1
Last Quintile From right
Distortion Class
From left
Green, Galley, and Manning Improved Models of Distortion Cost for SMT 37 / 37