ITG for Joint Phrasal Translation Modeling Colin Cherry Dekang Lin - - PowerPoint PPT Presentation

itg for joint phrasal translation modeling
SMART_READER_LITE
LIVE PREVIEW

ITG for Joint Phrasal Translation Modeling Colin Cherry Dekang Lin - - PowerPoint PPT Presentation

ITG for Joint Phrasal Translation Modeling Colin Cherry Dekang Lin University of Alberta Google Inc. April 26, 2007 University of Alberta The Gist Joint phrasal translation models (JPTM) learn a bilingual phrase table using EM


slide-1
SLIDE 1

University of Alberta April 26, 2007

ITG for Joint Phrasal Translation Modeling

Colin Cherry University of Alberta Dekang Lin Google Inc.

slide-2
SLIDE 2

1 of 28 University of Alberta April 26, 2007

The Gist

  • Joint phrasal translation models (JPTM) learn a

bilingual phrase table using EM

  • Phrasal ITG:

– Use synchronous parsing to replace hill climbing & sampling with dynamic programming

  • Do resulting phrase tables improve translation?
slide-3
SLIDE 3

2 of 28 University of Alberta April 26, 2007

Outline

  • Phrasal Translation Models
  • We build on:

– Phrase extraction, JPTM, ITG

  • Phrasal ITG

– Helpful constraints

  • Results
  • Summary & Future Work
slide-4
SLIDE 4

3 of 28 University of Alberta April 26, 2007

Phrasal translation model

  • Ultimately interested in a bilingual phrase table

– Lists and scores possible phrasal translations

English French P(e|f) P(f|e) … ethical food alimentation éthique 0.95 0.16 ethical foreign policy politique étrangère morale 0.23 0.01 ethical foundations fondements éthiques 0.10 0.03

slide-5
SLIDE 5

4 of 28 University of Alberta April 26, 2007

Surface Heuristic

  • Alignments provided by GIZA++ combination
  • Surface heuristic:

– Count each consistent phrase as occurring once – Aggregate counts over all sentence pairs

cars

  • red

likes he

  • il

aime les voitures rouges

slide-6
SLIDE 6

5 of 28 University of Alberta April 26, 2007

Surface Heuristic

  • Alignments provided by GIZA++ combination
  • Surface heuristic:

– Count each consistent phrase as occurring once – Aggregate counts over all sentence pairs

cars

  • red

likes he

  • il

aime les voitures rouges

slide-7
SLIDE 7

6 of 28 University of Alberta April 26, 2007

Surface Heuristic

  • Alignments provided by GIZA++ combination
  • Surface heuristic:

– Count each consistent phrase as occurring once – Aggregate counts over all sentence pairs

cars

  • red

likes he

  • il

aime les voitures rouges

slide-8
SLIDE 8

7 of 28 University of Alberta April 26, 2007

Surface Heuristic

  • Alignments provided by GIZA++ combination
  • Surface heuristic:

– Count each consistent phrase as occurring once – Aggregate counts over all sentence pairs

cars

  • red

likes he

  • il

aime les voitures rouges

slide-9
SLIDE 9

8 of 28 University of Alberta April 26, 2007

Joint Phrasal Model (JPTM)

  • Introduced by Marcu and Wong (2002)
  • Trained with EM, like the IBM models
  • Sentence pair built simultaneously

– Generate a bag of bilingual phrase pairs – Permute the phrases to form e and f

slide-10
SLIDE 10

9 of 28 University of Alberta April 26, 2007

Joint Phrasal Model

cars red likes he il aime les voitures rouges Reason over an exponential number of phrasal alignments Space is huge - task actually accomplished by sampling around high-probability point

slide-11
SLIDE 11

10 of 28 University of Alberta April 26, 2007

Joint Phrasal Model

cars red likes he il aime les voitures rouges Reason over an exponential number of phrasal alignments Space is huge - task actually accomplished by sampling around high-probability point

slide-12
SLIDE 12

11 of 28 University of Alberta April 26, 2007

Joint Phrasal Model

cars

  • red

likes he

  • il

aime les voitures rouges

Birch et al. (2006): Constrained JPTM Explore only phrasal alignments consistent with high precision word alignment

slide-13
SLIDE 13

12 of 28 University of Alberta April 26, 2007

Inversion Transduction Grammar

  • Introduced in by Wu (1997)

– Transduction:

  • C → red / rouge

– Inversion:

  • A → [A C]
  • B → <A C>

C rouge red C A A C Straight Inverted

slide-14
SLIDE 14

13 of 28 University of Alberta April 26, 2007

ITG Parse

il aime les voitures rouges he likes red cars

slide-15
SLIDE 15

14 of 28 University of Alberta April 26, 2007

Phrasal ITG

  • Any phrase pair can be produced by the lexicon
  • Choose between straight, inverted and now:

phrasal

calm down calmez vous calm down calmez vous calm down calmez vous

slide-16
SLIDE 16

15 of 28 University of Alberta April 26, 2007

Training Phrasal ITG

  • All phrase pairs share mass as a joint model
  • Can be trained unsupervised with inside-outside
  • No more expensive than binary bracketing:

– Phrases were already being explored as constituents

slide-17
SLIDE 17

16 of 28 University of Alberta April 26, 2007

The hope

  • By moving to exact expectation:

– Create more accurate statistics – Find a larger variety of phrase pairs

slide-18
SLIDE 18

17 of 28 University of Alberta April 26, 2007

The problem - still slow: O(n6)

  • ITG algorithms can be pruned:

– O(n4) potential constituents are considered – O(n2) time spent considering all ways to build each constituent

  • Fixed link pruning: Eliminate constituents that are not

consistent with a given word alignment

– Skip them and treat them as having 0 probability

  • One link can potentially rule out 50% of constituents
slide-19
SLIDE 19

18 of 28 University of Alberta April 26, 2007

Fixed Link Speed-up

  • Used GIZA++ intersection alignments
  • Inside-outside on first 100 sentences of corpus
  • Compared to Tic-tac-toe (Zhang & Gildea 2005)

415 37 5 1 10 100 1000 No prune Tic-tac-toe Fixed-link

Time (sec)

slide-20
SLIDE 20

19 of 28 University of Alberta April 26, 2007

What about the ITG constraint?

  • ITG can’t handle this due to discontinuous constituents
  • Check fixed links used for pruning

– If they are non-ITG, drop from training set

  • In our French-English Europarl set, this results in a

reduction in data of less than 1%

12 are acceptable to the commission

  • Mr. Burtone

fully or in part 12 sont acceptables pour la commission

  • M. Burtone

en tout ou partie

slide-21
SLIDE 21

20 of 28 University of Alberta April 26, 2007

Experiments

  • Conditionalize joint tables to P(e|f) and P(f|e)
  • French-English Europarl Set

– 25 length limit, 400k sentence pairs

  • SMT Workshop Baseline MT System

– Pharaoh, MERT Training on 500 tuning pairs

  • Included unnormalized IBM Model 1 features for all
  • Compared to:

– JPTM constrained with GIZA++ Intersect – Surface Heuristic Extraction with GIZA++ GDF

slide-22
SLIDE 22

21 of 28 University of Alberta April 26, 2007

Results: BLEU Scores

28.0 28.5 29.0 29.5 30.0 30.5 31.0 C-JPTM Phrasal ITG Surface

slide-23
SLIDE 23

22 of 28 University of Alberta April 26, 2007

Results: Table Size

(in millions of entries)

2 4 6 8 10 12 C-JPTM Phrasal ITG Surface

slide-24
SLIDE 24

23 of 28 University of Alberta April 26, 2007

Summary

  • Phrasal ITG that learns phrases from bitext

– Similar to JPTM

  • Complete expectations do matter

– Other JPTMs could benefit from improving their search and sampling methods

  • A new ITG pruning technique

– 80 times faster inside-outside

slide-25
SLIDE 25

24 of 28 University of Alberta April 26, 2007

Future: Eliminate Frequency Limits

  • Must constrain any joint model to use phrases

that occur with a minimum frequency

– Otherwise sentence = phrase is ML solution

cars red likes he il aime les voitures rouges

slide-26
SLIDE 26

25 of 28 University of Alberta April 26, 2007

Future: Eliminate Frequency Limits

  • Must constrain any joint model to use phrases

that occur with a minimum frequency

– Otherwise sentence = phrase is ML solution

28.0 28.5 29.0 29.5 30.0 30.5 31.0 C-JPTM >=5 Phrasal ITG >=5 Surface Surface >=5

slide-27
SLIDE 27

26 of 28 University of Alberta April 26, 2007 28.0 28.5 29.0 29.5 30.0 30.5 31.0 C-JPTM >=5 Phrasal ITG >=5 Surface Surface >=5

  • Must constrain any joint model to use phrases

that occur with a minimum frequency

– Otherwise sentence = phrase is ML solution

Future: Eliminate Frequency Limits

Apply Bayesian methods (priors) to replace these limits (Goldwater et al. 2006)

slide-28
SLIDE 28

27 of 28 University of Alberta April 26, 2007

This isn’t the whole story…

  • Explored the same model as a phrasal aligner
  • Needs additional constraints to work:

– Fixed links help select phrases that are non-compositional

  • Alignments work well with surface heuristic
  • Details in the paper!
slide-29
SLIDE 29

28 of 28 University of Alberta April 26, 2007

Questions? Comments? Suggestions? Support provided by:

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Alberta Ingenuity Fund Alberta Informatics Circle of Research Excellence

slide-30
SLIDE 30

29 of 28 University of Alberta April 26, 2007

Along the way…

  • Adapt consistency constraints from heuristic

phrase extraction for ITG parsing

  • Deal with the ITG constraint in large data
slide-31
SLIDE 31

30 of 28 University of Alberta April 26, 2007