Inversion Transduction Grammars Wilker Aziz 3/5/17 Word-based - - PowerPoint PPT Presentation

inversion transduction grammars
SMART_READER_LITE
LIVE PREVIEW

Inversion Transduction Grammars Wilker Aziz 3/5/17 Word-based - - PowerPoint PPT Presentation

Inversion Transduction Grammars Wilker Aziz 3/5/17 Word-based Translation Mary did not slap the green witch Mary no di una bofetada a la bruja verde Every French word is generated by an English word (or null) 2 Generative


slide-1
SLIDE 1

Inversion Transduction Grammars

Wilker Aziz 3/5/17

slide-2
SLIDE 2

Word-based Translation

Mary did not slap the green witch Mary no dió una bofetada a la bruja verde

Every French word is generated by an English word (or null)

2

slide-3
SLIDE 3

Generative Story IBM≥3: Given E

Mary did not slap the green witch

3

slide-4
SLIDE 4

Generative Story IBM≥3: Fertility

Mary did not slap the green witch Mary did not slap slap slap the green witch

4

slide-5
SLIDE 5

Generative Story IBM≥3: NULL insertion

Mary did not slap the green witch Mary did not slap slap slap the green witch NULL

5

slide-6
SLIDE 6

Generative Story IBM≥3: Translation

Mary did not slap the green witch Mary did not slap slap slap the green witch NULL Mary no dió una bofetada a la verde bruja

6

slide-7
SLIDE 7

Generative Story IBM≥3: Distortion

Mary did not slap the green witch Mary did not slap slap slap the green witch NULL Mary no dió una bofetada a la verde bruja Mary no dió una bofetada a la bruja verde

7

slide-8
SLIDE 8

Discussion

  • IBM models do not constrain divergence with

respect to word order

  • Distortion step must consider

all the m! permutations

  • f m French words

8

slide-9
SLIDE 9

All permutations: sensible or not?

If we do not impose structural constraints (yet they do exist)

  • the model will have to learn (rather implicitly)

how not to violate them

  • which ought to require more data

9

slide-10
SLIDE 10

Practical consequences

10

slide-11
SLIDE 11

Practical consequences

Estimation

  • modelling outcomes that even though possible

are not plausible (unlikely to be observed)

10

slide-12
SLIDE 12

Practical consequences

Estimation

  • modelling outcomes that even though possible

are not plausible (unlikely to be observed) Generation

  • NP-completeness!

10

slide-13
SLIDE 13

NP-completeness

slide-14
SLIDE 14

NP-completeness

NP-complete problem

slide-15
SLIDE 15

NP-completeness

NP-complete problem

  • Generalised TSP [Knight, 1999; Zaslavskiy et al, 2009]
slide-16
SLIDE 16

NP-completeness

NP-complete problem

  • Generalised TSP [Knight, 1999; Zaslavskiy et al, 2009]
  • Perfect matching [DeNero and Klein, 2008]
slide-17
SLIDE 17

NP-completeness

NP-complete problem

  • Generalised TSP [Knight, 1999; Zaslavskiy et al, 2009]
  • Perfect matching [DeNero and Klein, 2008]
  • All permutations [Asveld, 2006; 2008]
slide-18
SLIDE 18

All permutations

Let Σn = {a1, ..., an}

  • S ➝ AΣn
  • AX ➝ a AX-{a} for X ⊆ Σn, #X ≥ 2, a ∈ X
  • A{a} ➝ a

Regular grammar (there is an equivalent FSA)

Asveld (2006, 2008)

12

slide-19
SLIDE 19

Complexity

Note that nonterminals are indexed by subsets of Σn i.e. power set of Σ

  • 2n nonterminals (states)
  • n ⨉ 2n productions (transitions)
  • n! strings (paths)

13

slide-20
SLIDE 20

Example: 3 elements

S ➝ A123 A123 ➝ a1 A23 | a2 A13 | a3 A12 A12 ➝ a1 A2 | a2 A1 A13 ➝ a1 A3 | a3 A1 A23 ➝ a2 A3 | a3 A2 A1 ➝ a1 A2 ➝ a2 A3 ➝ a3

14

slide-21
SLIDE 21

"IBM constraint"

Distortion limit in generation but not in estimation

  • any reasons why that may be unsatisfactory?

15

slide-22
SLIDE 22

Constraining permutations without a distortion limit

Inversion Transduction Grammars (ITGs) [Wu, 1995; 1997]

  • Binarizable permutations
  • two streams are simultaneously generated
  • context-free backbone

16

slide-23
SLIDE 23

[Wu, 1997]

17

slide-24
SLIDE 24

Number of Permutations

[Wu, 1997]

slide-25
SLIDE 25

ITG

19

slide-26
SLIDE 26

ITG

English French

19

slide-27
SLIDE 27

ITG

English French S ➝ X X copy

19

slide-28
SLIDE 28

ITG

English French S ➝ X X copy X ➝ X1 X2 X1 X2 copy

19

slide-29
SLIDE 29

ITG

English French S ➝ X X copy X ➝ X1 X2 X1 X2 copy X2 X1 invert

19

slide-30
SLIDE 30

ITG

English French S ➝ X X copy X ➝ X1 X2 X1 X2 copy X2 X1 invert X ➝ e f transduce

19

slide-31
SLIDE 31

ITG

English French S ➝ X X copy X ➝ X1 X2 X1 X2 copy X2 X1 invert X ➝ e f transduce X ➝ e ε delete

19

slide-32
SLIDE 32

ITG

English French S ➝ X X copy X ➝ X1 X2 X1 X2 copy X2 X1 invert X ➝ e f transduce X ➝ e ε delete X ➝ ε f insert

19

slide-33
SLIDE 33

ITG Trees

I really miss you tanto sua falta Sinto I really miss you tanto sua falta Sinto

slide-34
SLIDE 34

ITG Trees

I really miss you tanto sua falta Sinto I really miss you tanto sua falta Sinto

A B E F

slide-35
SLIDE 35

Joint probability model P(T) = P(A, B, E, F)

Model

22

t = hr1, . . . , rni e = yield1(t) f = yield2(t) a = alignment(t) b = bracketing(t)

P(T = t) = P(A = a, B = b, E = e, F = f) =

N

Y

i=1

θri

slide-36
SLIDE 36

Parametrisation

23

slide-37
SLIDE 37

Parametrisation

Multinomial: one parameter per rule

23

slide-38
SLIDE 38

Parametrisation

Multinomial: one parameter per rule

  • θ[] one parameter for monotone

23

slide-39
SLIDE 39

Parametrisation

Multinomial: one parameter per rule

  • θ[] one parameter for monotone
  • θ<> one parameter for swap

23

slide-40
SLIDE 40

Parametrisation

Multinomial: one parameter per rule

  • θ[] one parameter for monotone
  • θ<> one parameter for swap
  • θe/f one parameter per word pair

23

slide-41
SLIDE 41

Parametrisation

Multinomial: one parameter per rule

  • θ[] one parameter for monotone
  • θ<> one parameter for swap
  • θe/f one parameter per word pair
  • θe/ε one parameter per deleted English word

23

slide-42
SLIDE 42

Parametrisation

Multinomial: one parameter per rule

  • θ[] one parameter for monotone
  • θ<> one parameter for swap
  • θe/f one parameter per word pair
  • θe/ε one parameter per deleted English word
  • θε/f one parameter per inserted French word

23

slide-43
SLIDE 43

MLE

We do not typically construct treebanks of ITG trees

  • potential counts instead of observed counts

Expectations from parse forests

  • Inside-Outside [Baker, 1979; Lari and Young, 1990; Goodman, 1999]

Typically initialised with IBM1

24

θX!α = hn(X ! α)iP (A,B|F,E) P

α0hn(X ! α0)iP (A,B|F,E)

slide-44
SLIDE 44

Inference: complexity O(l3m3) Model: too few reordering parameters Decisions: ambiguity

  • Disambiguation problem is NP-complete [Sima'an, 1996]

Difficulties

25

arg max

A

P(A|F, E) = arg max

A

X

B

P(A, B|F, E) ≈ arg max

A,B

P(A, B|F, E)

slide-45
SLIDE 45

Bibliography

  • Knight, Kevin. 1999. Decoding complexity in word-replacement translation models. In

Computational Linguistics. MIT Press.

  • Zaslavskiy, Mikhail and Dymetman, Marc and Cancedda, Nicola. 2009. Phrase-based

statistical machine translation as a traveling salesman problem. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference

  • n Natural Language Processing of the AFNLP: Volume 1 - Volume 1.
  • DeNero, John and Klein, Dan. 2008. The Complexity of Phrase Alignment Problems. In

Proceedings of ACL-08: HLT.

  • Asveld, Peter R. J. 2006. Generating All Permutations by Context-free Grammars in Chomsky

Normal Form. In Theoretical Computer Science. Elsevier Science Publishers Ltd.

  • Asveld, Peter R. J. 2008. Generating All Permutations by Context-free Grammars in Greibach

Normal Form. In Theoretical Computer Science. Elsevier Science Publishers Ltd.

  • Wu, D. 1995. An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words. In

Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. ACL.

slide-46
SLIDE 46

Bibliography

  • Wu, D. 1997. Stochastic Inversion Transduction Grammars and Bilingual

Parsing of Parallel Corpora. In Computational Linguistics. MIT Press.

  • James K. Baker. 1979. Trainable grammars for speech recognition. In

Proceedings of the Spring Conference of the Acoustical Society of America.

  • Karim Lari and Steve J. Young. 1990. The estimation of stochastic context-

free grammars using the inside--outside algorithm. In Computer Speech and Language.

  • Goodman, Joshua. 1999. Semiring parsing. In Computational Linguistics.
  • Sima'an, Khalil. 1996. Computational complexity of probabilistic

disambiguation by means of tree-grammars. In Proceedings of the 16th conference on Computational linguistics - Volume 2.