11,001 N EW F EATURES FOR S TATISTICAL M ACHINE T RANSLATION David - - PowerPoint PPT Presentation

11 001
SMART_READER_LITE
LIVE PREVIEW

11,001 N EW F EATURES FOR S TATISTICAL M ACHINE T RANSLATION David - - PowerPoint PPT Presentation

11,001 N EW F EATURES FOR S TATISTICAL M ACHINE T RANSLATION David Chiang Kevin Knight Wei Wang MOTIVATION Maria no di una bofetada a la bruja verde the green witch Maria DT JJ NN slap NNP NP VB NP did not NP VP RB VBD VP VP


slide-1
SLIDE 1

11,001 NEW FEATURES FOR STATISTICAL MACHINE TRANSLATION

David Chiang Kevin Knight Wei Wang

slide-2
SLIDE 2

MOTIVATION

VP VBD VP

slap

VB NP NP

Maria

NP

the green witch

NNP DT JJ NN S NP RB

not did

VP

Maria no dió una bofetada a la bruja verde

slide-3
SLIDE 3

MOTIVATION

VP VBD VP

slapped

VBD NP NP

Maria

NP

the green witch

NNP DT JJ NN S NP RB

not did

VP

Maria no dió una bofetada a la bruja verde

slide-4
SLIDE 4

MOTIVATION

VP VBD VP

slapped

VBD NP NP

Maria

NP

the green witch

NNP DT JJ NN S NP RB

not did

VP

Maria no dió una bofetada a la bruja verde

slide-5
SLIDE 5

MOTIVATION

VP VBD VP

slapped

VBD NP NP

Maria

NP

the green witch

NNP DT JJ NN S NP RB

not did

VP

Maria no dió una bofetada a la bruja verde

slide-6
SLIDE 6

MOTIVATION

  • Minimum error rate training (MERT) works for <30 features
  • Margin infused relaxed algorithm (MIRA)
  • Online large-margin discriminative training
  • Scales better to large feature sets
  • Enables freer exploration of features
slide-7
SLIDE 7

RESULTS

System Training Features BLEU Hiero MERT 11 36.1 Hiero MIRA 10,990 37.6 Syntax MERT 25 39.5 Syntax MIRA 283 40.6

GALE 2008 Chinese-English data

slide-8
SLIDE 8

OVERVIEW

Features Experiments

  • Training
slide-9
SLIDE 9

Training

slide-10
SLIDE 10

MIRA

  • Crammer and Singer, 2003
  • Applied to statistical MT by Watanabe et al., 2007
  • Chiang, Marton, and Resnik, 2008:
  • use more of the forest
  • parallelize training
slide-11
SLIDE 11

MERT

BLEU Model score

slide-12
SLIDE 12

MERT

BLEU Model score

slide-13
SLIDE 13

margin

MIRA

BLEU Model score

loss

slide-14
SLIDE 14

MIRA

BLEU Model score

slide-15
SLIDE 15

FOREST-BASED TRAINING

BLEU Model score

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 46
  • 44
  • 42
  • 40
  • 38
  • 36
  • 34
slide-16
SLIDE 16

PARALLEL TRAINING

decode decode update decode update decode decode update decode update broadcast broadcast broadcast broadcast broadcast

  • Run n MIRA learners in parallel
  • Share information among learners

Hiero n = 20 Syntax n = 73

slide-17
SLIDE 17

Features

slide-18
SLIDE 18

DISCOUNT FEATURES

  • Low counts are often overestimates
  • Introduce a count=1 feature that fires on 1-count rules, etc.

PP PP IN from IN around NP1

晚上 NP1 左右 count=1

p.m. around

slide-19
SLIDE 19

TARGET SYNTAX FEATURES

insert-were

VP VP VBD

were

VP

expelled by NK UN inspectors VP

VP

expelled by NK UN inspectors VP

slide-20
SLIDE 20

TARGET SYNTAX FEATURES

S NP ADVP NNP edo NP thinking NN PP IN

  • f

NP NP the best-selling book " PP VBN published NP his autobiography … VP . " NP PP NP PP art for the generation in mind

bad-rewrite bad-rewrite

slide-21
SLIDE 21

TARGET SYNTAX FEATURES

node=,

S NP VP

said

VBD ,

, Yoon

S NP VP

said

VBD S

Yoon

S

slide-22
SLIDE 22

第一个 站 出来 stand up

TARGET SYNTAX FEATURES

first come out

PP IN

from

NP

the first leg

IN NP NP

the first

SBAR

to

VP VP

root=IN root=VP

stand

第一个 站 出来

first come out stand

slide-23
SLIDE 23

这 是 一个 值得 关注 和 研究 的 新 动向 .

SOURCE CONTEXT FEATURES

Marton & Resnik 2008; Chiang et al 2008

  • Use external parser to infer source-side syntax
  • Rewards and penalties for matching/crossing brackets

VP

new trends in the study cross-VP

this is a merit attention study new trend and

slide-24
SLIDE 24

这 是 一个 值得 关注 和 研究 的 新 动向 .

SOURCE CONTEXT FEATURES

Marton & Resnik 2008; Chiang et al 2008

  • Use external parser to infer source-side syntax
  • Rewards and penalties for matching/crossing brackets

VP

meriting attention and study match-VP

this is a merit attention study new trend and

slide-25
SLIDE 25

SOURCE CONTEXT FEATURES

Chiang et al 2008 挪威 恢复 在 斯里兰卡 的 和平 斡旋

Norway restore in Sri Lanka peace mediation

to restore peace in Sri Lanka , the Norwegian mediation Norway restoring peace mediation in Sri Lanka 挪威 恢复 在 斯里兰卡 的 和平 斡旋

Norway restore in Sri Lanka peace mediation

slide-26
SLIDE 26

SOURCE CONTEXT FEATURES

  • Word context features: similar to Watanabe et al. 2007 and

work on WSD in MT (Chan et al. 2007, Carpuat & Wu 2007)

  • Relate a word’s translation with its left or right neighbor on

the source side (just the 100 most frequent types)

fi fi-1 e fi fi+1 e

slide-27
SLIDE 27

SOURCE CONTEXT FEATURES

fi=, & fi-1=说 & e=that fi=, & fi-1=说 & e=, 他 说 , 由于 没有 配音 , 他 不得不 since there is no voice , he said , he had to 他 说 , 由于 没有 配音 , 他 不得不 he said that because of the lack of voice , he had to

he said because no voice he had to he said because no voice he had to

slide-28
SLIDE 28

Experiments

slide-29
SLIDE 29

TRAINING DATA

Hiero Syntax Parallel data 260M 65M Language model 2G 1G MERT/MIRA 58k 58k Test 57k 57k

GALE 2008 Chinese-English data

slide-30
SLIDE 30

RESULTS (HIERO)

Training Features # BLEU MERT baseline 11 36.1 +source-side syntax +distortion 56 36.9 MIRA +discount 61 37.3 +word context 10,990 37.6

Chinese-English

slide-31
SLIDE 31

RESULTS (SYNTAX)

Training Features # BLEU MERT baseline 25 39.5 baseline 25 39.8 rule overlap 132 39.9 MIRA node count 136 40.0 +discount +bad rewrite +insertion 283 40.6

Chinese-English

slide-32
SLIDE 32

CONCLUSIONS

  • Using underutilized information for new features:
  • Source context is computationally efficient
  • Target syntax provides a rich structure
  • MIRA is working well on new features, systems, languages