Improving Trees and Alignments for Syntax- Based Machine - - PowerPoint PPT Presentation

improving trees and alignments for syntax based machine
SMART_READER_LITE
LIVE PREVIEW

Improving Trees and Alignments for Syntax- Based Machine - - PowerPoint PPT Presentation

Improving Trees and Alignments for Syntax- Based Machine Translation Kevin Knight USC/Information Sciences Institute joint work with Steven DeNeefe, Daniel Marcu, Wei Wang, and Jonathan May SRI, July 12, 2007 Syntactic Approaches to MT


slide-1
SLIDE 1

Improving Trees and Alignments for Syntax- Based Machine Translation

Kevin Knight USC/Information Sciences Institute

SRI, July 12, 2007

joint work with Steven DeNeefe, Daniel Marcu, Wei Wang, and Jonathan May

slide-2
SLIDE 2

Syntactic Approaches to MT

  • Use of syntactic information (noun, verb,

etc) in the translation process:

– Manually constructed rule-based systems – Statistical systems

  • Wu & Wong, 1998
  • Yamada & Knight, 2001-2002
  • Galley et al, 2004

– Contrast with phrase-based statistical approaches

slide-3
SLIDE 3

Phrase-Based Output

Gunman of police killed .

Decoder Hypothesis #1

. 击毙 警方 被 枪手

slide-4
SLIDE 4

Phrase-Based Output

Gunman of police attack .

Decoder Hypothesis #7

. 击毙 警方 被 枪手

slide-5
SLIDE 5

Phrase-Based Output

Gunman by police killed .

Decoder Hypothesis #12

. 击毙 警方 被 枪手

slide-6
SLIDE 6

Phrase-Based Output

Killed gunman by police .

Decoder Hypothesis #134

. 击毙 警方 被 枪手

slide-7
SLIDE 7

Phrase-Based Output

Gunman killed the police .

Decoder Hypothesis #9,329

. 击毙 警方 被 枪手

slide-8
SLIDE 8

Phrase-Based Output

Gunman killed by police .

Problematic –

  • Output lacks English auxiliary and determiner
  • Re-ordering relies on luck, instead of on

Chinese passive marker

Decoder Hypothesis #50,654

. 击毙 警方 被 枪手

slide-9
SLIDE 9

The gunman killed by police . DT NN VBD IN NN NPB PP NP-C VP S

Syntax-Based Output

Decoder Hypothesis #1

. 击毙 警方 被 枪手

slide-10
SLIDE 10

Gunman by police shot . NN IN NN VBD NPB PP NP-C VP S

Syntax-Based Output

Decoder Hypothesis #16

. 击毙 警方 被 枪手

slide-11
SLIDE 11

The gunman was killed by police . DT NN AUX VBN IN NN NPB PP NP-C VP S

Syntax-Based Output

Decoder Hypothesis #1923

. 击毙 警方 被 枪手

slide-12
SLIDE 12

Why Might Syntax Help?

  • Phrase-based MT output is “n-grammatical”, not

grammatical

– Every sentence needs a subject and a verb

  • Re-ordering is poorly explained as “distortion” --

better explained as syntactic transformation

– Arabic to English, VSO SVO

  • Function words have syntactic effects even if they

are not themselves translated

slide-13
SLIDE 13

Why Might Syntax Hurt?

available phrase-based translations

  • Less freedom to glue

pieces of output together -- search space has fewer output strings

  • Search space is more

difficult to navigate

  • Rule extraction from

bilingual text has limitations

this talk

slide-14
SLIDE 14

Why Might Syntax Hurt?

available phrase-based translations available syntax-based translations

  • Less freedom to glue

pieces of output together -- search space has fewer output strings

  • Search space is more

difficult to navigate

  • Rule extraction from

bilingual text has limitations

this talk

slide-15
SLIDE 15

Why Might Syntax Hurt?

  • Less freedom to glue

pieces of output together -- search space has fewer output strings

  • Search space is more

difficult to navigate

  • Rule extraction from

bilingual text has limitations

available phrase-based translations available syntax-based translations

slide-16
SLIDE 16

Comparing Phrase-Based Extraction with Syntax-Based Extraction

  • Quantitatively compare

– A typical phrase-based bilingual extraction algorithm (ATS, Och & Ney 2004) – A typical syntax-based bilingual extraction algorithm (GHKM, Galley et al 2004) – These algorithms picked from two good- scoring NIST-06 systems

  • Identify areas of improvement for syntax-

based rule coverage

slide-17
SLIDE 17

Phrase-Based and Syntax-Based Pattern Extraction

etree alignment cstring

GHKM [Galley et al 2004] syntax transformation rules consistent with word alignment

estring alignment cstring

ATS [Och & Ney, 2004] phrase pairs consistent with word alignment

slide-18
SLIDE 18

ATS (Och & Ney, 2004)

i felt obliged to do my part

我 有 责任 尽 一份 力

PHRASE PAIRS ACQUIRED: felt 有 felt obliged 有 责任 felt obliged to do 有 责任 尽

  • bliged

责任

  • bliged to do

责任 尽 do 尽 part 一份 part 一份 力

slide-19
SLIDE 19

ATS (Och & Ney, 2004)

i felt obliged to do my part

我 有 责任 尽 一份 力

PHRASE PAIRS ACQUIRED: felt 有 felt obliged 有 责任 felt obliged to do 有 责任 尽

  • bliged

责任

  • bliged to do

责任 尽 do 尽 part 一份 part 一份 力

slide-20
SLIDE 20

ATS (Och & Ney, 2004)

i felt obliged to do my part

我 有 责任 尽 一份 力

PHRASE PAIRS ACQUIRED: felt 有 felt obliged 有 责任 felt obliged to do 有 责任 尽

  • bliged

责任

  • bliged to do

责任 尽 do 尽 part 一份 part 一份 力

slide-21
SLIDE 21

ATS (Och & Ney, 2004)

i felt obliged to do my part

我 有 责任 尽 一份 力

PHRASE PAIRS ACQUIRED: felt 有 felt obliged 有 责任 felt obliged to do 有 责任 尽

  • bliged

责任

  • bliged to do

责任 尽 do 尽 part 一份 part 一份 力

slide-22
SLIDE 22

GHKM (Galley et al, 2004)

i felt obliged to do my part

我 有 责任 尽 一份 力

S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB NPB PRP PRP$ NN RULES ACQUIRED: VBD(felt) 有 VBN(obliged) 责任 VP(x0:VBD VP-C(x1:VBN x2:SG-C) x0 x1 x2 VP(VBD(felt) VP-C(VBN(obliged)) x0:SG-C) 有 责任 x0 S(x0:NP-C x1:VP) x0 x1

slide-23
SLIDE 23

GHKM (Galley et al, 2004)

i felt obliged to do my part

我 有 责任 尽 一份 力

S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB NPB PRP PRP$ NN RULES ACQUIRED: VBD(felt) 有 VBN(obliged) 责任 VP(x0:VBD VP-C(x1:VBN x2:SG-C) x0 x1 x2 VP(VBD(felt) VP-C(VBN(obliged)) x0:SG-C) 有 责任 x0 S(x0:NP-C x1:VP) x0 x1

slide-24
SLIDE 24

GHKM (Galley et al, 2004)

i felt obliged to do my part

我 有 责任 尽 一份 力

S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB NPB PRP PRP$ NN RULES ACQUIRED: VBD(felt) 有 VBN(obliged) 责任 VP(x0:VBD VP-C(x1:VBN x2:SG-C) x0 x1 x2 VP(VBD(felt) VP-C(VBN(obliged)) x0:SG-C) 有 责任 x0 S(x0:NP-C x1:VP) x0 x1

slide-25
SLIDE 25

GHKM (Galley et al, 2004)

i felt obliged to do my part

我 有 责任 尽 一份 力

S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB NPB PRP PRP$ NN RULES ACQUIRED: VBD(felt) 有 VBN(obliged) 责任 VP(x0:VBD VP-C(x1:VBN x2:SG-C) x0 x1 x2 VP(VBD(felt) VP-C(VBN(obliged)) x0:SG-C) 有 责任 x0 S(x0:NP-C x1:VP) x0 x1

slide-26
SLIDE 26

GHKM (Galley et al, 2004)

i felt obliged to do my part

我 有 责任 尽 一份 力

RULES ACQUIRED: VBD(felt) 有 VBN(obliged) 责任 VP(x0:VBD VP-C(x1:VBN x2:SG-C) x0 x1 x2 VP(VBD(felt) VP-C(VBN(obliged)) x0:SG-C) 有 责任 x0 S(x0:NP-C x1:VP) x0 x1 S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB NPB PRP PRP$ NN

slide-27
SLIDE 27

GHKM (Galley et al, 2004)

i felt obliged to do my part

我 有 责任 尽 一份 力

S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB NPB PRP PRP$ NN RULES ACQUIRED: VBD(felt) 有 VBN(obliged) 责任 VP(x0:VBD VP-C(x1:VBN x2:SG-C) x0 x1 x2 VP(VBD(felt) VP-C(VBN(obliged)) x0:SG-C) 有 责任 x0 S(x0:NP-C x1:VP) x0 x1

slide-28
SLIDE 28

GHKM (Galley et al, 2004)

i felt obliged to do my part

我 有 责任 尽 一份 力

S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB NPB PRP PRP$ NN RULES ACQUIRED: VBD(felt) 有 VBN(obliged) 责任 VP(x0:VBD VP-C(x1:VBN x2:SG-C) x0 x1 x2 VP(VBD(felt) VP-C(VBN(obliged)) x0:SG-C)

有 有 有 责任 责任 责任 责任 x0 S(x0:NP-C x1:VP) x0 x1 minimal rules tile the tree/string/alignment triple. composed rules are made by combining those tiles.

slide-29
SLIDE 29

GHKM Syntax Rules

S NP1 VP VB NP2 VB, NP1, NP2 S PRO VP VB NP there are hay, NP NP NP2 PP

  • f

P NP1 NP1, , NP2 Multilevel Re-Ordering Non-constituent Phrases Lexicalized Re-Ordering VP VBZ VBG is está, cantando Phrasal Translation singing VP VB NP PRT put poner, NP Non-contiguous Phrases

  • n

NPB DT NNS the NNS Context-Sensitive Word Insertion

slide-30
SLIDE 30

GHKM Syntax Rules

S NP1 VP VB NP2 VB, NP1, NP2 S PRO VP VB NP there are hay, NP NP NP2 PP

  • f

P NP1 NP1, , NP2 Multilevel Re-Ordering Non-constituent Phrases Lexicalized Re-Ordering VP VBZ VBG is está, cantando Phrasal Translation singing VP VB NP PRT put poner, NP Non-contiguous Phrases

  • n

NPB DT NNS the NNS Context-Sensitive Word Insertion

slide-31
SLIDE 31

ATS and GHKM Methods Do Not Coincide

GHKM Phrase Pairs Relevant to NIST-02 ATS Phrase Pairs Relevant to NIST-02 43k 161k 134k GHKM has no built-in phrase size limit -- ATS does. GHKM pulls unaligned English words into phrases. GHKM only gets minimal rules to explain each segment pair. GHKM forced to incorporate unaligned English words into phrases. GHKM forced to incorporate some unaligned foreign words into phrases. GHKM misses phrases due to parse failures. GHKM phrases come with applicability conditions.

slide-32
SLIDE 32

ATS and GHKM Methods Overlap

GHKM Phrase Pairs Relevant to NIST-02 ATS Phrase Pairs actually used in 1-best decodings of NIST-02 (1,994 = 2 per sentence). GHKM only gets minimal rules to explain each segment pair. GHKM forced to incorporate unaligned English words into phrases. GHKM forced to incorporate some unaligned foreign words into phrases. GHKM misses phrases due to parse failures. GHKM phrases come with applicability conditions. 1,994

slide-33
SLIDE 33

ATS and GHKM Methods Overlap

GHKM Phrase Pairs Relevant to NIST-02 ATS Phrase Pairs actually used in 1-best decodings of NIST-02 (1,994 = 2 per sentence). GHKM only gets minimal rules to explain each segment pair. GHKM forced to incorporate unaligned English words into phrases. GHKM forced to incorporate some unaligned foreign words into phrases. GHKM misses phrases due to parse failures. GHKM phrases come with applicability conditions. 1,994 GOAL: REDUCE THIS NUMBER

slide-34
SLIDE 34

Four Ideas for Improving Syntax-Based Rule Extraction

  • Acquire larger rules

Composed rules (Galley et al, 06) Phrasal rules (Marcu et al, 06)

  • Acquire more general rules

Re-structure English trees (Wang et al, 07) Re-align tree/string pairs (May & Knight, 07)

slide-35
SLIDE 35

Larger, Composed Rules

c1 c2 c3 e1 e2 e3 A B C

Minimal GHKM Rules: B(e1 e2) c1 c2 C(e3) c3 A(x0:B x1:C) x0 x1 Additional Composed Rules: A(B(e1 e2) x0:C) -> c1 c2 x0 A(x0:B C(e3)) -> x0 c3 A(B(e1 e2) C(e3)) -> c1 c2 c3 “big phrasal rule”

slide-36
SLIDE 36

Larger, Composed Rules

c1 c2 c3 e1 e2 e3 A B C

Minimal GHKM Rules: B(e1 e2) c1 c2 C(e3) c3 A(x0:B x1:C) x0 x1 Additional Composed Rules: A(B(e1 e2) x0:C) -> c1 c2 x0 A(x0:B C(e3)) -> x0 c3 A(B(e1 e2) C(e3)) -> c1 c2 c3 “big phrasal rule”

slide-37
SLIDE 37

Larger, Composed Rules

c1 c2 c3 e1 e2 e3 A B C

Minimal GHKM Rules: B(e1 e2) c1 c2 C(e3) c3 A(x0:B x1:C) x0 x1 Additional Composed Rules: A(B(e1 e2) x0:C) -> c1 c2 x0 A(x0:B C(e3)) -> x0 c3 A(B(e1 e2) C(e3)) -> c1 c2 c3 “big phrasal rule”

slide-38
SLIDE 38

Larger, Composed Rules

c1 c2 c3 e1 e2 e3 A B C

Minimal GHKM Rules: B(e1 e2) c1 c2 C(e3) c3 A(x0:B x1:C) x0 x1 Additional Composed Rules: A(B(e1 e2) x0:C) -> c1 c2 x0 A(x0:B C(e3)) -> x0 c3 A(B(e1 e2) C(e3)) -> c1 c2 c3 “big phrasal rule”

slide-39
SLIDE 39

Larger, Composed Rules

900 55.8m 4 1096 26.9m 3 1478 12.4m 2 1994 2.5m 0 = minimal Unacquired phrase pairs used in ATS 1- best decodings # of rules acquired Composed limit (internal nodes in composed rule)

slide-40
SLIDE 40

“Phrasal” Syntax Rules

  • SPMT Model 1 (Marcu et al 2006)

– consider each foreign phrase up to length L – extract smallest possible syntax rule that does not violate alignments 900 Composed 4 663 Both 676 SPMT M1 1994 Minimal Unacquired ATS Phrase Pairs Method

slide-41
SLIDE 41

Restructuring English Training Trees

NPB DT JJ NNP NNP NNP NNP the Israeli Prime Minister Ariel Sharon c1 c2 c3 NPB DT NPB JJ NPB NNP NPB NNP NPB NNP NNP c1 c2 c3 the Israeli pr. min. Ariel Sharon

slide-42
SLIDE 42

Restructuring English Training Trees

663 + SPMT M1 900 + Composed 4 458 + Restructuring 1994 Minimal Unacquired ATS Phrase Pairs Method

slide-43
SLIDE 43

Effects of Coverage Improvements on Syntax-Based MT Accuracy

Test-03 Dev-02 Test-03 Dev-02

52.42 52.86 42.41 43.45

+ Left binarization of etrees

52.12 52.15 42.17 43.30

GHKM composed 4 + SPMT

51.81 50.74 40.34 41.01

GHKM minimal + SPMT

52.26 52.05 41.82 42.63

GHKM composed 4

52.04 51.96 41.62 42.28

GHKM composed 3

51.52 51.18 40.90 41.59

GHKM composed 2

50.46 49.81 38.85 39.11

GHKM minimal

51.04 50.88 34.31 36.00

ATS Arabic/English

Trained on 4.1m words

Chinese/English

Trained on 9.8m words

NIST Bleu r4n4

slide-44
SLIDE 44

Can We Do Better?

  • Improved binarization methods
  • Improved word alignment of tree/string

pairs

slide-45
SLIDE 45

Why are Penn Treebank Trees Problematic?

维克多·切尔诺梅尔金 及 其 同事 ? 俄罗斯 首相 维克多·切尔诺梅尔金 及 其 同事

slide-46
SLIDE 46

Why are Penn Treebank Trees Problematic?

R1 R2 俄罗斯 首相 维克多·切尔诺梅尔金 维克多·切尔诺梅尔金 及 其 同事 ? 及 其 同事

slide-47
SLIDE 47

Binarizing English Trees

俄罗斯 首相 及 其 同事 Right binarize 俄罗斯 首相 维克多·切尔诺梅尔金

维克多· 切尔诺梅尔金

及 其 同事 维克多·切尔诺梅尔金 R3 R4 R6 R5 Left binarize 及 其 同事

slide-48
SLIDE 48

Simple Binarizations

维克多·切尔诺梅尔金

slide-49
SLIDE 49

Parallel Binarization

R L

维克多·切尔诺梅尔金

slide-50
SLIDE 50

Parallel Binarization

R L R L

维克多·切尔诺梅尔金

slide-51
SLIDE 51

Forest-Based Rule Extraction

  • Gets all minimal rules consistent with word

alignment and some binarization

  • Run EM algorithm to determine best

binarization of each node in each tree

slide-52
SLIDE 52

Binarization Using EM

e-tree f-string, alignment

slide-53
SLIDE 53

Binarization Using EM

e-tree parallel binarization e-forest f-string, alignment

slide-54
SLIDE 54

Binarization Using EM

e-tree parallel binarization e-forest forest-based extraction

  • f minimal rules

rules f-string, alignment

slide-55
SLIDE 55

Binarization Using EM

e-tree parallel binarization e-forest forest-based extraction

  • f minimal rules

derivation forests EM f-string, alignment rules

slide-56
SLIDE 56

Binarization Using EM

e-tree parallel binarization e-forest forest-based extraction

  • f minimal rules

derivation forests EM viterbi derivation for each example project e-tree binarized e-tree f-string, alignment rules

slide-57
SLIDE 57

Binarization Using EM

e-tree parallel binarization e-forest forest-based extraction

  • f minimal rules

derivation forests EM viterbi derivation for each example project e-tree binarized e-tree composed rule extraction (Galley et al., 2006) rules for decoding f-string, alignment rules

slide-58
SLIDE 58

Binarization Using EM

e-tree parallel binarization e-forest forest-based extraction

  • f minimal rules

derivation forests EM viterbi derivation for each example project e-tree binarized e-tree composed rule extraction (Galley et al., 2006) rules for decoding f-string, alignment rules ???

slide-59
SLIDE 59

Experimental Results

37.94 (p=0.0047) 115.6m EM 37.54 (p=0.086) 113.8m Head 37.49 (p=0.044) 113.0m Right 37.47 (p=0.047) 114.0m Left 36.94 63.4m None Test Bleu (NIST-03) # of Rules Learned Type of Binarization

slide-60
SLIDE 60

Tree binarized by EM training

slide-61
SLIDE 61

Last Topic: Alignment

  • GIZA++ string-based alignments

– are errorful – don’t match our syntax-based MT system

  • Would like to use the tree-based

translation model to align data

slide-62
SLIDE 62

Last Topic: Alignment

English trees Foreign strings GIZA++ initial word alignments GHKM syntax rule extraction minimal rules EM alignment (“Training Tree Transducers”, Graehl & Knight’04) Viterbi derivations Improved word alignments GHKM syntax rule extraction better rules for decoding Details in May & Knight, 07 Result: +0.5-1.0 Bleu

throw away GIZA alignments

slide-63
SLIDE 63

Conclusions

  • Phrase-based and syntax-based extraction

algorithms have different coverage.

  • Syntax-based coverage can be improved:

– composed rules – phrasal rules – binarizing English trees with EM – re-aligning tree/string pairs with EM

  • Improvements lead to better translation

accuracy.

slide-64
SLIDE 64

the end