Translation as Weighted Deduction Adam Lopez University of - - PowerPoint PPT Presentation
Translation as Weighted Deduction Adam Lopez University of - - PowerPoint PPT Presentation
Translation as Weighted Deduction Adam Lopez University of Edinburgh Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 Adam Lopez Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 Adam Lopez Moses Hiero Koehn et al., ACL 2007
Adam Lopez
Moses Hiero
Koehn et al., ACL 2007 Chiang, CL 2007
Adam Lopez
Moses Hiero
Koehn et al., ACL 2007 Chiang, CL 2007
Adam Lopez
Moses Hiero
Koehn et al., ACL 2007 Chiang, CL 2007
Adam Lopez
30.7 32.6
Lopez, Coling 2008
Moses Hiero
Koehn et al., ACL 2007 Chiang, CL 2007
Adam Lopez
30.7 32.6
Lopez, Coling 2008
Moses Hiero
Koehn et al., ACL 2007 Chiang, CL 2007
stack decoding 15 features phrase-based cube pruning 5 features hierarchical phrase-based
Adam Lopez
30.7 32.6
Lopez, Coling 2008
Moses Hiero
Koehn et al., ACL 2007 Chiang, CL 2007
rules parameters search
stack decoding 15 features phrase-based cube pruning 5 features hierarchical phrase-based
Adam Lopez
rules parameters search
stack decoding 15 features phrase-based cube pruning 5 features hierarchical phrase-based
Adam Lopez
rules parameters search
stack decoding 15 features phrase-based cube pruning 5 features hierarchical phrase-based cube pruning 5 features
Adam Lopez
rules parameters search
stack decoding 15 features phrase-based cube pruning 5 features hierarchical phrase-based cube pruning 5 features synchronous TAG
Adam Lopez
This talk is not about
How to improve your BLEU score by 1.9.
Adam Lopez
This talk is about
Building and analyzing translation models and algorithms in a modular way.
Adam Lopez
rules parameters search
stack decoding 15 features phrase-based
Adam Lopez
rules parameters search
stack decoding 15 features phrase-based
deductive logic
Adam Lopez
rules parameters search
stack decoding 15 features phrase-based
deductive logic semiring
Adam Lopez
rules parameters search
stack decoding 15 features phrase-based
deductive logic semiring (hyper)graph algorithms
Adam Lopez
北 风 呼啸
Adam Lopez
北 风 呼啸
- word-to-word translation
- no reordering
北/north 风/wind 呼啸/whistles 北/northerly 风/winds 呼啸/strong
Adam Lopez
北 风 呼啸 north wind whistles northerly wind whistles north winds whistles northerly winds whistles north wind strong northerly wind strong north winds strong northerly winds strong notice: complexity is O(2L) for sentence length L 北/north 风/wind 呼啸/whistles 北/northerly 风/winds 呼啸/strong
Adam Lopez
北 风 呼啸 北/north (.8) 风/wind (.6) 呼啸/whistles (.7) 北/northerly (.2) 风/winds (.4) 呼啸/strong (.3) complexity is O(L) for sentence length L north northerly wind winds whistle strong
Adam Lopez
north northerly wind winds whistle strong
Adam Lopez
[0] [1] [2] [3] north northerly wind winds whistle strong
Adam Lopez
[0] [1] [2] [3] 风
[1] R( /wind) [2]
north northerly wind winds whistle strong
Adam Lopez
[0] [1] [2] [3] [i] 风
[1] R( /wind) [2]
north northerly wind winds whistle strong
Adam Lopez
[0] [1] [2] [3] [i] R(fi+1/ej) 风
[1] R( /wind) [2]
north northerly wind winds whistle strong
Adam Lopez
[0] [1] [2] [3] [i] R(fi+1/ej) [i + 1] 风
[1] R( /wind) [2]
north northerly wind winds whistle strong
Adam Lopez
i ranges over sentence length Determine complexity from inspection
McAllester, Proc. Static Analysis 1999
[0] [1] [2] [3] [i] R(fi+1/ej) [i + 1] 风
[1] R( /wind) [2]
north northerly wind winds whistle strong
Adam Lopez
[i] [i] R(fi+1/ej) [i + 1] north northerly wind winds whistle strong Compute many quantities on same graph
Goodman, CL 1999
Viterbi: sum: [0, 1], max, × [0, 1], +, × {⊤, ⊥}, ∪, ∩ Boolean: Reverse (outside) values
Adam Lopez
[i] [i] R(fi+1/ej) [i + 1] north northerly wind winds whistle strong Compute many quantities on same graph
Goodman, CL 1999
Expectation semiring
Eisner 2002
Approximation semiring
Gimpel & Smith 2009
Adam Lopez
Basic Idea
- Supply a logic and a semiring, get a
complete algorithm.
- Does it work for most translation
models?
Adam Lopez
[i′′, V ] R(fi+1...fi′/ej...ej′) [i′, V ∨ 0i1i′−i0I−i′] V ∧ 0i1i′−i0I−i′ = 0I, |i − i′′| ≤ d
Adam Lopez
previous coverage vector phrase pair last position translated coverage vector
- nly translate previously
untranslated words distortion limit previous last position translated
[i′′, V ] R(fi+1...fi′/ej...ej′) [i′, V ∨ 0i1i′−i0I−i′] V ∧ 0i1i′−i0I−i′ = 0I, |i − i′′| ≤ d
Adam Lopez
Phrase-based Models
Adam Lopez
Phrase-based Models
Window length d
Moses (Hoang & Koehn, pc)
Max distortion d
see, e.g. Moore & Quirk 2007
First d uncovered
see, e.g. Tillman & Ney 2003, Zens & Ney 2004
Adam Lopez
Phrase-based Models
Window length d
Moses (Hoang & Koehn, pc)
Max distortion d
see, e.g. Moore & Quirk 2007
First d uncovered
see, e.g. Tillman & Ney 2003, Zens & Ney 2004
[i′′, V ] R(fi+1...fi′/ej...ej′) [i′, V ∨ 0i1i′−i0I−i′] V ∧ 0i1i′−i0I−i′ = 0I, |i − i′′| ≤ d [i, C] R(fi+1...fi′/ej...ej′) [i′, C ≪ i′ − i] C ∧ 1i′−i0d−i′+i = 0d, i′ − i ≤ d [i, C] R(fi′...fi′′/ej...ej′) [i, C ∨ 0i′−i1i′′−i′0d−i′′+i]C ∧ 0i′−i1i′′−i′0d−i′′+i = 0d, i′′ − i ≤ d
[i, U] R(fi′...fi′′/ej...ej′) [i′′, U − [i′, i′′] ∨ [i′′, i′′ + d − |U − [i′, i′′]|]]i′ > i, fi+1 ∈ U [i, U] R(fi′...fi′′/ej...ej′) [i, U − [i′, i′′] ∨ [max(U ∨ i) + 1, max(U ∨ i) + 1 + d − |U − [i′, i′′]|]]i′ < i, [fi′, fi′′] ⊂ U
Adam Lopez
Phrase-based Models
Window length d
Moses (Hoang & Koehn, pc)
Max distortion d
see, e.g. Moore & Quirk 2007
First d uncovered
see, e.g. Tillman & Ney 2003, Zens & Ney 2004
Adam Lopez
Phrase-based Models
Window length d
Moses (Hoang & Koehn, pc)
Max distortion d
see, e.g. Moore & Quirk 2007
First d uncovered
see, e.g. Tillman & Ney 2003, Zens & Ney 2004
Adam Lopez
Phrase-based Models
d = 3 Window length d
Moses (Hoang & Koehn, pc)
Max distortion d
see, e.g. Moore & Quirk 2007
First d uncovered
see, e.g. Tillman & Ney 2003, Zens & Ney 2004
Adam Lopez
Phrase-based Models
d = 3 Window length d
Moses (Hoang & Koehn, pc)
Max distortion d
see, e.g. Moore & Quirk 2007
First d uncovered
see, e.g. Tillman & Ney 2003, Zens & Ney 2004
Adam Lopez
Phrase-based Models
d = 3 Window length d
Moses (Hoang & Koehn, pc)
Max distortion d
see, e.g. Moore & Quirk 2007
First d uncovered
see, e.g. Tillman & Ney 2003, Zens & Ney 2004
Adam Lopez
Phrase-based Models
d = 3 O(n3d2) O(nd22d) O(nd n
d+1
- )
Window length d
Moses (Hoang & Koehn, pc)
Max distortion d
see, e.g. Moore & Quirk 2007
First d uncovered
see, e.g. Tillman & Ney 2003, Zens & Ney 2004
Adam Lopez
Phrase-based Models
These models are not the same.
- Each can generate translations that the
- ther cannot (regardless of d).
- Different complexities.
- Reported results will be impossible to
replicate with your (different) strategy.
Adam Lopez
Good News
- Most translation models are a few lines
- f deductive logic.
- Computation of any semiring for free.
- You might conclude: give a logic and a
semiring, get a complete algorithm.
Adam Lopez
Good News
- Most translation models are a few lines
- f deductive logic.
- Computation of any semiring for free.
- You might conclude: give a logic and a
semiring, get a complete algorithm.
Adam Lopez
Result
- Given:
- A logic
- A semiring
- Get: a complete algorithm
Adam Lopez
Bad News
- Our models use non-local features.
- We need approximate search algorithms
(and we need to be able to tweak them).
Adam Lopez
Non-local features
Adam Lopez
Non-local features
Adam Lopez
Non-local features
[eq, ..., eq+n−2]R(eq, ..., eq+n−1) [eq+1, ..., eq+n−1]
Adam Lopez
Non-local features
Adam Lopez
Non-local features
Adam Lopez
Non-local features
Adam Lopez
Non-local features
Adam Lopez
Non-local features
[i] R(fi+1...fi′/ej...ej′) [i′]
minimal logic
Adam Lopez
Non-local features
[i] R(fi+1...fi′/ej...ej′) [i′]
minimal logic
[i, ej−n+1, ..., ej−1] R(fi+1...fi′/ej...ej′)R(ej−n+1, ..., ej)...R(ej′−n+1...ej′) [i′, ej′−n+2...ej′]
complete logic
Adam Lopez
Non-local features
[i] R(fi+1...fi′/ej...ej′) [i′]
minimal logic
[i, ej−n+1, ..., ej−1] R(fi+1...fi′/ej...ej′)R(ej−n+1, ..., ej)...R(ej′−n+1...ej′) [i′, ej′−n+2...ej′]
complete logic
Adam Lopez
Deductive logics provide useful tools to manipulate search algorithms PRODUCT (Cohen et al. ICLP 2009) Fold-Unfold (Eisner & Blatz 2006; Johnson 2007)
Adam Lopez
Result
- Given:
- A complete logic
- A semiring
- Get: a complete algorithm
- Problem: how to deal with exact search?
Adam Lopez
Result
- Given:
- A complete logic
- A semiring
- Get: a complete algorithm
- Problem: how to deal with approximate
search?
Adam Lopez
Search
Adam Lopez
stack decoding
Koehn 2004
Search
Adam Lopez
Search
Adam Lopez
Search
Adam Lopez
Search
Adam Lopez
Search
Adam Lopez
Search
Adam Lopez
Search
Adam Lopez
Result
- Given:
- A complete logic
- A semiring
- A stack predicate
- Pruning parameters
- Get: a complete algorithm
Adam Lopez
Stack Pruning Effects
sentence length number of items Window length d
Adam Lopez
sentence length number of items retained in stacks
Stack Pruning Effects
Window length d
Adam Lopez
sentence length number of items First d uncovered retained in stacks
Stack Pruning Effects
Adam Lopez
Search
0.4 0.3 0.2
Adam Lopez
Search
0.4 0.3 0.2 0.7 0.2 0.1
R(fi+1/ej)
Adam Lopez
Search
0.28 0.08 0.04 0.21 0.06 0.03 0.14 0.04 0.02
0.4 0.3 0.2 0.7 0.2 0.1
R(fi+1/ej)
Adam Lopez
Search
0.28 0.08 0.04 0.21 0.06 0.03 0.14 0.04 0.02
0.4 0.3 0.2 0.7 0.2 0.1
R(fi+1/ej) 0.5 0.4 0.5 0.9 0.3 0.6 0.5 0.3 0.4
x
R(ej−n+1, ..., ej)
Adam Lopez
Search
0.28 0.08 0.04 0.21 0.06 0.03 0.14 0.04 0.02
0.4 0.3 0.2
=
0.14 0.03 0.02 0.18 0.02 0.02 0.7 0.01 0.01
0.7 0.2 0.1
R(fi+1/ej) 0.5 0.4 0.5 0.9 0.3 0.6 0.5 0.3 0.4
x
R(ej−n+1, ..., ej)
Adam Lopez
Search
0.28 0.08 0.04 0.21 0.06 0.03 0.14 0.04 0.02
0.4 0.3 0.2
=
0.14 0.03 0.02 0.18 0.02 0.02 0.7 0.01 0.01
0.7 0.2 0.1
R(fi+1/ej) 0.5 0.4 0.5 0.9 0.3 0.6 0.5 0.3 0.4
x
R(ej−n+1, ..., ej)
Cube Pruning
Chiang, 2007; Huang & Chiang, 2007
Adam Lopez
Search
Cube Pruning
Chiang, 2007; Huang & Chiang, 2007
Adam Lopez
Search
Cube Pruning
Chiang, 2007; Huang & Chiang, 2007
Adam Lopez
Result
- Given:
- A minimal logic
- A complete logic
- A semiring
- Pruning parameters
- Get: a complete algorithm
Adam Lopez
Conclusion
- Translation can easily be cast in the
deductive framework.
- Analysis reveals inconsistencies.
- Modify models with logic transforms.
- Easy to describe non-local features.
- Search strategies can be incorporated
into deductive systems.
Adam Lopez
Future Work
- Other approximate search strategies.
- Modular implementation.
- Exploration of novel models.
Adam Lopez