Translation as Weighted Deduction Adam Lopez University of - - PowerPoint PPT Presentation

translation as weighted deduction
SMART_READER_LITE
LIVE PREVIEW

Translation as Weighted Deduction Adam Lopez University of - - PowerPoint PPT Presentation

Translation as Weighted Deduction Adam Lopez University of Edinburgh Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 Adam Lopez Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 Adam Lopez Moses Hiero Koehn et al., ACL 2007


slide-1
SLIDE 1

Translation as Weighted Deduction

Adam Lopez University of Edinburgh

slide-2
SLIDE 2

Adam Lopez

Moses Hiero

Koehn et al., ACL 2007 Chiang, CL 2007

slide-3
SLIDE 3

Adam Lopez

Moses Hiero

Koehn et al., ACL 2007 Chiang, CL 2007

slide-4
SLIDE 4

Adam Lopez

Moses Hiero

Koehn et al., ACL 2007 Chiang, CL 2007

slide-5
SLIDE 5

Adam Lopez

30.7 32.6

Lopez, Coling 2008

Moses Hiero

Koehn et al., ACL 2007 Chiang, CL 2007

slide-6
SLIDE 6

Adam Lopez

30.7 32.6

Lopez, Coling 2008

Moses Hiero

Koehn et al., ACL 2007 Chiang, CL 2007

stack decoding 15 features phrase-based cube pruning 5 features hierarchical phrase-based

slide-7
SLIDE 7

Adam Lopez

30.7 32.6

Lopez, Coling 2008

Moses Hiero

Koehn et al., ACL 2007 Chiang, CL 2007

rules parameters search

stack decoding 15 features phrase-based cube pruning 5 features hierarchical phrase-based

slide-8
SLIDE 8

Adam Lopez

rules parameters search

stack decoding 15 features phrase-based cube pruning 5 features hierarchical phrase-based

slide-9
SLIDE 9

Adam Lopez

rules parameters search

stack decoding 15 features phrase-based cube pruning 5 features hierarchical phrase-based cube pruning 5 features

slide-10
SLIDE 10

Adam Lopez

rules parameters search

stack decoding 15 features phrase-based cube pruning 5 features hierarchical phrase-based cube pruning 5 features synchronous TAG

slide-11
SLIDE 11

Adam Lopez

This talk is not about

How to improve your BLEU score by 1.9.

slide-12
SLIDE 12

Adam Lopez

This talk is about

Building and analyzing translation models and algorithms in a modular way.

slide-13
SLIDE 13

Adam Lopez

rules parameters search

stack decoding 15 features phrase-based

slide-14
SLIDE 14

Adam Lopez

rules parameters search

stack decoding 15 features phrase-based

deductive logic

slide-15
SLIDE 15

Adam Lopez

rules parameters search

stack decoding 15 features phrase-based

deductive logic semiring

slide-16
SLIDE 16

Adam Lopez

rules parameters search

stack decoding 15 features phrase-based

deductive logic semiring (hyper)graph algorithms

slide-17
SLIDE 17

Adam Lopez

北 风 呼啸

slide-18
SLIDE 18

Adam Lopez

北 风 呼啸

  • word-to-word translation
  • no reordering

北/north 风/wind 呼啸/whistles 北/northerly 风/winds 呼啸/strong

slide-19
SLIDE 19

Adam Lopez

北 风 呼啸 north wind whistles northerly wind whistles north winds whistles northerly winds whistles north wind strong northerly wind strong north winds strong northerly winds strong notice: complexity is O(2L) for sentence length L 北/north 风/wind 呼啸/whistles 北/northerly 风/winds 呼啸/strong

slide-20
SLIDE 20

Adam Lopez

北 风 呼啸 北/north (.8) 风/wind (.6) 呼啸/whistles (.7) 北/northerly (.2) 风/winds (.4) 呼啸/strong (.3) complexity is O(L) for sentence length L north northerly wind winds whistle strong

slide-21
SLIDE 21

Adam Lopez

north northerly wind winds whistle strong

slide-22
SLIDE 22

Adam Lopez

[0] [1] [2] [3] north northerly wind winds whistle strong

slide-23
SLIDE 23

Adam Lopez

[0] [1] [2] [3] 风

[1] R( /wind) [2]

north northerly wind winds whistle strong

slide-24
SLIDE 24

Adam Lopez

[0] [1] [2] [3] [i] 风

[1] R( /wind) [2]

north northerly wind winds whistle strong

slide-25
SLIDE 25

Adam Lopez

[0] [1] [2] [3] [i] R(fi+1/ej) 风

[1] R( /wind) [2]

north northerly wind winds whistle strong

slide-26
SLIDE 26

Adam Lopez

[0] [1] [2] [3] [i] R(fi+1/ej) [i + 1] 风

[1] R( /wind) [2]

north northerly wind winds whistle strong

slide-27
SLIDE 27

Adam Lopez

i ranges over sentence length Determine complexity from inspection

McAllester, Proc. Static Analysis 1999

[0] [1] [2] [3] [i] R(fi+1/ej) [i + 1] 风

[1] R( /wind) [2]

north northerly wind winds whistle strong

slide-28
SLIDE 28

Adam Lopez

[i] [i] R(fi+1/ej) [i + 1] north northerly wind winds whistle strong Compute many quantities on same graph

Goodman, CL 1999

Viterbi: sum: [0, 1], max, × [0, 1], +, × {⊤, ⊥}, ∪, ∩ Boolean: Reverse (outside) values

slide-29
SLIDE 29

Adam Lopez

[i] [i] R(fi+1/ej) [i + 1] north northerly wind winds whistle strong Compute many quantities on same graph

Goodman, CL 1999

Expectation semiring

Eisner 2002

Approximation semiring

Gimpel & Smith 2009

slide-30
SLIDE 30

Adam Lopez

Basic Idea

  • Supply a logic and a semiring, get a

complete algorithm.

  • Does it work for most translation

models?

slide-31
SLIDE 31

Adam Lopez

[i′′, V ] R(fi+1...fi′/ej...ej′) [i′, V ∨ 0i1i′−i0I−i′] V ∧ 0i1i′−i0I−i′ = 0I, |i − i′′| ≤ d

slide-32
SLIDE 32

Adam Lopez

previous coverage vector phrase pair last position translated coverage vector

  • nly translate previously

untranslated words distortion limit previous last position translated

[i′′, V ] R(fi+1...fi′/ej...ej′) [i′, V ∨ 0i1i′−i0I−i′] V ∧ 0i1i′−i0I−i′ = 0I, |i − i′′| ≤ d

slide-33
SLIDE 33

Adam Lopez

Phrase-based Models

slide-34
SLIDE 34

Adam Lopez

Phrase-based Models

Window length d

Moses (Hoang & Koehn, pc)

Max distortion d

see, e.g. Moore & Quirk 2007

First d uncovered

see, e.g. Tillman & Ney 2003, Zens & Ney 2004

slide-35
SLIDE 35

Adam Lopez

Phrase-based Models

Window length d

Moses (Hoang & Koehn, pc)

Max distortion d

see, e.g. Moore & Quirk 2007

First d uncovered

see, e.g. Tillman & Ney 2003, Zens & Ney 2004

[i′′, V ] R(fi+1...fi′/ej...ej′) [i′, V ∨ 0i1i′−i0I−i′] V ∧ 0i1i′−i0I−i′ = 0I, |i − i′′| ≤ d [i, C] R(fi+1...fi′/ej...ej′) [i′, C ≪ i′ − i] C ∧ 1i′−i0d−i′+i = 0d, i′ − i ≤ d [i, C] R(fi′...fi′′/ej...ej′) [i, C ∨ 0i′−i1i′′−i′0d−i′′+i]C ∧ 0i′−i1i′′−i′0d−i′′+i = 0d, i′′ − i ≤ d

[i, U] R(fi′...fi′′/ej...ej′) [i′′, U − [i′, i′′] ∨ [i′′, i′′ + d − |U − [i′, i′′]|]]i′ > i, fi+1 ∈ U [i, U] R(fi′...fi′′/ej...ej′) [i, U − [i′, i′′] ∨ [max(U ∨ i) + 1, max(U ∨ i) + 1 + d − |U − [i′, i′′]|]]i′ < i, [fi′, fi′′] ⊂ U

slide-36
SLIDE 36

Adam Lopez

Phrase-based Models

Window length d

Moses (Hoang & Koehn, pc)

Max distortion d

see, e.g. Moore & Quirk 2007

First d uncovered

see, e.g. Tillman & Ney 2003, Zens & Ney 2004

slide-37
SLIDE 37

Adam Lopez

Phrase-based Models

Window length d

Moses (Hoang & Koehn, pc)

Max distortion d

see, e.g. Moore & Quirk 2007

First d uncovered

see, e.g. Tillman & Ney 2003, Zens & Ney 2004

slide-38
SLIDE 38

Adam Lopez

Phrase-based Models

d = 3 Window length d

Moses (Hoang & Koehn, pc)

Max distortion d

see, e.g. Moore & Quirk 2007

First d uncovered

see, e.g. Tillman & Ney 2003, Zens & Ney 2004

slide-39
SLIDE 39

Adam Lopez

Phrase-based Models

d = 3 Window length d

Moses (Hoang & Koehn, pc)

Max distortion d

see, e.g. Moore & Quirk 2007

First d uncovered

see, e.g. Tillman & Ney 2003, Zens & Ney 2004

slide-40
SLIDE 40

Adam Lopez

Phrase-based Models

d = 3 Window length d

Moses (Hoang & Koehn, pc)

Max distortion d

see, e.g. Moore & Quirk 2007

First d uncovered

see, e.g. Tillman & Ney 2003, Zens & Ney 2004

slide-41
SLIDE 41

Adam Lopez

Phrase-based Models

d = 3 O(n3d2) O(nd22d) O(nd n

d+1

  • )

Window length d

Moses (Hoang & Koehn, pc)

Max distortion d

see, e.g. Moore & Quirk 2007

First d uncovered

see, e.g. Tillman & Ney 2003, Zens & Ney 2004

slide-42
SLIDE 42

Adam Lopez

Phrase-based Models

These models are not the same.

  • Each can generate translations that the
  • ther cannot (regardless of d).
  • Different complexities.
  • Reported results will be impossible to

replicate with your (different) strategy.

slide-43
SLIDE 43

Adam Lopez

Good News

  • Most translation models are a few lines
  • f deductive logic.
  • Computation of any semiring for free.
  • You might conclude: give a logic and a

semiring, get a complete algorithm.

slide-44
SLIDE 44

Adam Lopez

Good News

  • Most translation models are a few lines
  • f deductive logic.
  • Computation of any semiring for free.
  • You might conclude: give a logic and a

semiring, get a complete algorithm.

slide-45
SLIDE 45

Adam Lopez

Result

  • Given:
  • A logic
  • A semiring
  • Get: a complete algorithm
slide-46
SLIDE 46

Adam Lopez

Bad News

  • Our models use non-local features.
  • We need approximate search algorithms

(and we need to be able to tweak them).

slide-47
SLIDE 47

Adam Lopez

Non-local features

slide-48
SLIDE 48

Adam Lopez

Non-local features

slide-49
SLIDE 49

Adam Lopez

Non-local features

[eq, ..., eq+n−2]R(eq, ..., eq+n−1) [eq+1, ..., eq+n−1]

slide-50
SLIDE 50

Adam Lopez

Non-local features

slide-51
SLIDE 51

Adam Lopez

Non-local features

slide-52
SLIDE 52

Adam Lopez

Non-local features

slide-53
SLIDE 53

Adam Lopez

Non-local features

slide-54
SLIDE 54

Adam Lopez

Non-local features

[i] R(fi+1...fi′/ej...ej′) [i′]

minimal logic

slide-55
SLIDE 55

Adam Lopez

Non-local features

[i] R(fi+1...fi′/ej...ej′) [i′]

minimal logic

[i, ej−n+1, ..., ej−1] R(fi+1...fi′/ej...ej′)R(ej−n+1, ..., ej)...R(ej′−n+1...ej′) [i′, ej′−n+2...ej′]

complete logic

slide-56
SLIDE 56

Adam Lopez

Non-local features

[i] R(fi+1...fi′/ej...ej′) [i′]

minimal logic

[i, ej−n+1, ..., ej−1] R(fi+1...fi′/ej...ej′)R(ej−n+1, ..., ej)...R(ej′−n+1...ej′) [i′, ej′−n+2...ej′]

complete logic

slide-57
SLIDE 57

Adam Lopez

Deductive logics provide useful tools to manipulate search algorithms PRODUCT (Cohen et al. ICLP 2009) Fold-Unfold (Eisner & Blatz 2006; Johnson 2007)

slide-58
SLIDE 58

Adam Lopez

Result

  • Given:
  • A complete logic
  • A semiring
  • Get: a complete algorithm
  • Problem: how to deal with exact search?
slide-59
SLIDE 59

Adam Lopez

Result

  • Given:
  • A complete logic
  • A semiring
  • Get: a complete algorithm
  • Problem: how to deal with approximate

search?

slide-60
SLIDE 60

Adam Lopez

Search

slide-61
SLIDE 61

Adam Lopez

stack decoding

Koehn 2004

Search

slide-62
SLIDE 62

Adam Lopez

Search

slide-63
SLIDE 63

Adam Lopez

Search

slide-64
SLIDE 64

Adam Lopez

Search

slide-65
SLIDE 65

Adam Lopez

Search

slide-66
SLIDE 66

Adam Lopez

Search

slide-67
SLIDE 67

Adam Lopez

Search

slide-68
SLIDE 68

Adam Lopez

Result

  • Given:
  • A complete logic
  • A semiring
  • A stack predicate
  • Pruning parameters
  • Get: a complete algorithm
slide-69
SLIDE 69

Adam Lopez

Stack Pruning Effects

sentence length number of items Window length d

slide-70
SLIDE 70

Adam Lopez

sentence length number of items retained in stacks

Stack Pruning Effects

Window length d

slide-71
SLIDE 71

Adam Lopez

sentence length number of items First d uncovered retained in stacks

Stack Pruning Effects

slide-72
SLIDE 72

Adam Lopez

Search

0.4 0.3 0.2

slide-73
SLIDE 73

Adam Lopez

Search

0.4 0.3 0.2 0.7 0.2 0.1

R(fi+1/ej)

slide-74
SLIDE 74

Adam Lopez

Search

0.28 0.08 0.04 0.21 0.06 0.03 0.14 0.04 0.02

0.4 0.3 0.2 0.7 0.2 0.1

R(fi+1/ej)

slide-75
SLIDE 75

Adam Lopez

Search

0.28 0.08 0.04 0.21 0.06 0.03 0.14 0.04 0.02

0.4 0.3 0.2 0.7 0.2 0.1

R(fi+1/ej) 0.5 0.4 0.5 0.9 0.3 0.6 0.5 0.3 0.4

x

R(ej−n+1, ..., ej)

slide-76
SLIDE 76

Adam Lopez

Search

0.28 0.08 0.04 0.21 0.06 0.03 0.14 0.04 0.02

0.4 0.3 0.2

=

0.14 0.03 0.02 0.18 0.02 0.02 0.7 0.01 0.01

0.7 0.2 0.1

R(fi+1/ej) 0.5 0.4 0.5 0.9 0.3 0.6 0.5 0.3 0.4

x

R(ej−n+1, ..., ej)

slide-77
SLIDE 77

Adam Lopez

Search

0.28 0.08 0.04 0.21 0.06 0.03 0.14 0.04 0.02

0.4 0.3 0.2

=

0.14 0.03 0.02 0.18 0.02 0.02 0.7 0.01 0.01

0.7 0.2 0.1

R(fi+1/ej) 0.5 0.4 0.5 0.9 0.3 0.6 0.5 0.3 0.4

x

R(ej−n+1, ..., ej)

Cube Pruning

Chiang, 2007; Huang & Chiang, 2007

slide-78
SLIDE 78

Adam Lopez

Search

Cube Pruning

Chiang, 2007; Huang & Chiang, 2007

slide-79
SLIDE 79

Adam Lopez

Search

Cube Pruning

Chiang, 2007; Huang & Chiang, 2007

slide-80
SLIDE 80

Adam Lopez

Result

  • Given:
  • A minimal logic
  • A complete logic
  • A semiring
  • Pruning parameters
  • Get: a complete algorithm
slide-81
SLIDE 81

Adam Lopez

Conclusion

  • Translation can easily be cast in the

deductive framework.

  • Analysis reveals inconsistencies.
  • Modify models with logic transforms.
  • Easy to describe non-local features.
  • Search strategies can be incorporated

into deductive systems.

slide-82
SLIDE 82

Adam Lopez

Future Work

  • Other approximate search strategies.
  • Modular implementation.
  • Exploration of novel models.
slide-83
SLIDE 83

Adam Lopez

Thanks

Phil Blunsom Chris Callison-Burch Chris Dyer Hieu Hoang Martin Kay Philipp Koehn Josh Schroeder Lane Schwartz