Lattice and Hypergraph MERT Graham Neubig Nara Institute of Science - - PowerPoint PPT Presentation

lattice and hypergraph mert
SMART_READER_LITE
LIVE PREVIEW

Lattice and Hypergraph MERT Graham Neubig Nara Institute of Science - - PowerPoint PPT Presentation

Lattice and Hypergraph MERT Lattice and Hypergraph MERT Graham Neubig Nara Institute of Science and Technology (NAIST) 12/20/2012 1 Lattice and Hypergraph MERT Papers Introduced: Lattice-based Minimum Error Rate Training for


slide-1
SLIDE 1

1

Lattice and Hypergraph MERT

Lattice and Hypergraph MERT

Graham Neubig Nara Institute of Science and Technology (NAIST)

12/20/2012

slide-2
SLIDE 2

2

Lattice and Hypergraph MERT

Papers Introduced:

  • “Lattice-based Minimum Error Rate Training for Statistical

Machine Translation” Wolfgang Macherey, Franz Josef Och, Ignacio Thayer, Jakob Uszkoreit (Google) EMNLP 2008

  • “Efficient Minimum Error Rate Training and Minimum

Bayes-Risk Decoding for Translation Hypergraphs and Lattices” Shankar Kumar, Wolfgang Macherey, Chris Dyer, Franz Och (Google/University of Maryland) ACL-IJCNLP 2009

slide-3
SLIDE 3

3

Lattice and Hypergraph MERT

Summary

  • Minimum error rate training (MERT) is used to train the

parameters for machine translation

  • Normal MERT uses n-best lists
  • However, there is not enough diversity in n-best lists,

→ unstable training & large accuracy fluctuations

  • As a solution these papers perform MERT over
  • lattices for phrase-based translation [Macherey+ 08]
  • hypergraphs for tree-based translation [Kumar+ 09]
  • This leads to more stable training in fewer iterations
slide-4
SLIDE 4

4

Lattice and Hypergraph MERT

Tuning/MERT

slide-5
SLIDE 5

Lattice and Hypergraph MERT

Tuning

  • Scores of translation, reordering, and language models
  • If we add weights, we can get better answers:
  • Tuning finds these weights: wLM=0.2 wTM=0.3 wRM=0.5

○ Taro visited Hanako ☓ the Taro visited the Hanako ☓ Hanako visited Taro LM TM RM

  • 4
  • 3
  • 1
  • 8
  • 5
  • 4
  • 1
  • 10
  • 2
  • 3
  • 2
  • 7

Best Score ☓ LM TM RM

  • 4
  • 3
  • 1
  • 2.2
  • 5
  • 4
  • 1
  • 2.7
  • 2
  • 3
  • 2
  • 2.3

Best Score ○ 0.2* 0.2* 0.2* 0.3* 0.3* 0.3* 0.5* 0.5* 0.5* ○ Taro visited Hanako ☓ the Taro visited the Hanako ☓ Hanako visited Taro

slide-6
SLIDE 6

Lattice and Hypergraph MERT

MERT

  • MERT performs iterations to increase the score

[Och 03]

Weights Model

太郎が花子を訪問した

Decode the Taro visited the Hanako Hanako visited Taro Taro visited Hanako ... Taro visited Hanako Find better weights

source (dev) n-best (dev) reference (dev)

slide-7
SLIDE 7

Lattice and Hypergraph MERT

MERT

  • MERT performs iterations to increase the score

[Och 03]

Weights Model

太郎が花子を訪問した

Decode the Taro visited the Hanako Hanako visited Taro Taro visited Hanako ... Taro visited Hanako Find better weights

source (dev) n-best (dev) reference (dev)

These slides

slide-8
SLIDE 8

8

Lattice and Hypergraph MERT

MERT Weight Update:

  • Adjust one weight at a time

Weights wLM wTM Initial: 0.1 0.1 0.1 Score 0.20 Optimize wLM: 0.4 0.1 0.1 0.32 Optimize wTM: 0.4 0.1 0.1 0.32 Optimize wRM: 0.4 0.1 0.3 0.4 Optimize wLM: 0.35 0.1 0.3 0.41 Optimize wTM: wRM

slide-9
SLIDE 9

Lattice and Hypergraph MERT

Updating One Weight:

  • We start with:

n-best list fixed weights: weight to be adjusted:

f1 φLM φTM φRM BLEU* e1,1 1

  • 1

e1,2 0 1 1 e1,3 1 1 wLM=-1, wTM=1 f2 φLM φTM φRM BLEU* e2,1 1

  • 2

e2,2 3 1 e2,3 2 1 2 1 wRM=???

* Calculating BLEU for one sentence is a bit simplified, usually we compute for the whole corpus

slide-10
SLIDE 10

Lattice and Hypergraph MERT

Updating One Weight:

  • Next, transform each hypothesis into lines:
  • Where:
  • a is the value of the feature to be adjusted
  • b is the weighted sum of the fixed features
  • x is the weight to be adjusted (unknown)

y=a x+b

slide-11
SLIDE 11

Lattice and Hypergraph MERT

Updating One Weight:

  • Example:

wLM=-1, wTM=1, wRM=??? f1 φLM φTM φRM e1,1 1

  • 1

e1,2 0 1 e1,3 1 1

y=a x+b a=ϕRM b=wLM ϕLM+wTM ϕTM

a1,1=-1 a1,2=0 a1,3=1 b1,1=-1 b1,2=1 b1,3=-1

slide-12
SLIDE 12

Lattice and Hypergraph MERT

Updating One Weight:

  • Draw lines on a graph:
  • 4
  • 2

2 4

  • 4
  • 2

2 4

  • 4
  • 2

2 4

  • 4
  • 2

2 4

f1 hypotheses f2 hypotheses

e1,1 e1,2 e1,3 e2,1 e2,2 e2,3

y=a x+b

a1,1=-1 a1,2=0 a1,3=1 b1,1=-1 b1,2=1 b1,3=-1 a2,1=-2 a2,2=1 a2,3=-2 b2,1=-1 b2,2=-3 b2,3=1

BLEU=1 BLEU=0

slide-13
SLIDE 13

Lattice and Hypergraph MERT

Updating One Weight:

  • Find the lines that are highest for each range of x:
  • This is called the convex hull (or upper envelope)
  • 4
  • 2

2 4

  • 4
  • 2

2 4

  • 4
  • 2

2 4

  • 4
  • 2

2 4

f1 hypotheses f2 hypotheses

e1,1 e1,2 e1,3 e2,1 e2,2 e2,3

slide-14
SLIDE 14

Lattice and Hypergraph MERT

Updating One Weight:

  • 4
  • 2

2 4

  • 4
  • 2

2 4

  • 4
  • 2

2 4

  • 4
  • 2

2 4 e1,1 e1,2 e1,3 e2,1 e2,2 e2,3

  • 4
  • 2

2 4 1

  • 4
  • 2

2 4 1

  • Using the convex hull, find scores at each range:

BLEU

f1 f2

slide-15
SLIDE 15

Lattice and Hypergraph MERT

Updating One Weight:

  • Combine multiple sentences into a single error plane:
  • 4
  • 2

2 4 1

  • 4
  • 2

2 4 1

BLEU

f1 f2

  • 4
  • 2

2 4 1

total accuracy

BLEU

slide-16
SLIDE 16

Lattice and Hypergraph MERT

Updating One Weight:

  • Choose middle of best region:
  • 4
  • 2

2 4 1

wRM ←1.0

total accuracy

BLEU

slide-17
SLIDE 17

Lattice and Hypergraph MERT

Summary

  • For each sentence:
  • Create lines for each n-best hypothesis
  • Combine lines and find upper envelope
  • Transform upper envelope into error surface
  • Combine error surfaces into one
  • Find the range with the highest score
  • Set the weight to the middle of the range
slide-18
SLIDE 18

Lattice and Hypergraph MERT

Summary

  • For each sentence:
  • Create lines for each n-best hypothesis
  • Combine lines and find upper envelope
  • Transform upper envelope into error surface
  • Combine error surfaces into one
  • Find the range with the highest score
  • Set the weight to the middle of the range

Problem! (not enough diversity)

slide-19
SLIDE 19

Lattice and Hypergraph MERT

Result of Lack of Diversity

  • Unstable training:

Traditional MERT in Green

[Macherey 08]

slide-20
SLIDE 20

20

Lattice and Hypergraph MERT

Lattice MERT

slide-21
SLIDE 21

Lattice and Hypergraph MERT

Translation Lattice

  • Represent many hypotheses compactly:
  • MERT on lattices can solve the diversity problem

Taro the Taro met visited Hanako the Hanako

8 hypotheses in only 6 edges

slide-22
SLIDE 22

Lattice and Hypergraph MERT

Factoring Feature Functions

  • Each edge in the lattice has a feature value:
  • Hypothesis's features are sum of edge features:

Taro the Taro met visited Hanako the Hanako

φLM=1, φTM=1, φRM=2 φLM=-2, φTM=-1, φRM=-1 φLM=2, φTM=0, φRM=1 φLM=1, φTM=1, φRM=-1 φLM=2, φTM=0, φRM=0 φLM=0, φTM=-1, φRM=-1

Taro met Hanako

φLM=1, φTM=1, φRM=2 φLM=1, φTM=1, φRM=-1 φLM=0, φTM=-1, φRM=-1

φLM=2, φTM=1, φRM=0

slide-23
SLIDE 23

Lattice and Hypergraph MERT

MERT on Lattices:

  • For each sentence:
  • Transform each edge into lines
  • Find the upper envelope for the lattice
  • Transform upper envelope into error surface
  • Combine error surfaces into one
  • Find the range with the highest score
  • Set the weight to the middle of the range

Only different part!!

slide-24
SLIDE 24

Lattice and Hypergraph MERT

First, Transform each Edge into Lines

Taro the Taro met visited Hanako the Hanako

φLM=1, φTM=1, φRM=2 φLM=-2, φTM=-1, φRM=-1 φLM=2, φTM=0, φRM=1 φLM=1, φTM=1, φRM=-1 φLM=2, φTM=0, φRM=0 φLM=0, φTM=-1, φRM=-1

Taro the Taro met visited Hanako the Hanako

a=2, b=0

wLM=-1, wTM=1, wRM=???

y=a x+b a=ϕRM b=wLM ϕLM+wTM ϕTM

a=1, b=-2 a=-1, b=-1 a=-1, b=1 a=0, b=-2 a=-1, b=0

slide-25
SLIDE 25

Lattice and Hypergraph MERT

Finding the Upper Envelope for Lattices:

  • Can be done with dynamic programming
  • 1) Start with flat envelope for initial node
  • 2) Calculate upper envelope for next node using

previous nodes

slide-26
SLIDE 26

Lattice and Hypergraph MERT

Start with Flat Envelope

  • 4
  • 2

2 4

  • 4
  • 2

2 4

y=a x+b

a=0 b=0 “”

Taro the Taro met visited Hanako the Hanako

a=2, b=0 a=1, b=-2 a=-1, b=-1 a=-1, b=1 a=0, b=-2 a=-1, b=0

slide-27
SLIDE 27

Lattice and Hypergraph MERT

  • 4
  • 2

2 4

  • 4
  • 2

2 4

Add First Node

  • 4
  • 2

2 4

  • 4
  • 2

2 4

y=a x+b

a=0 b=0 “”

Taro the Taro met visited Hanako the Hanako

a=2, b=0 a=1, b=-2 a=-1, b=-1 a=-1, b=1 a=0, b=-2 a=-1, b=0 a=2 b=0 “Taro” a=1 b=-2 “the Taro”

slide-28
SLIDE 28

Lattice and Hypergraph MERT

  • 4
  • 2

2 4

  • 4
  • 2

2 4

Add Second

y=a x+b

Taro the Taro met visited Hanako the Hanako

a=2, b=0 a=1, b=-2 a=-1, b=-1 a=-1, b=1 a=0, b=-2 a=-1, b=0 a=2 b=0 “Taro” a=1 b=-2 “the Taro”

  • 4
  • 2

2 4

  • 4
  • 2

2 4

a=1 b=-1 “Taro met” a=0 b=-3 “the Taro met” a=1 b=1 “Taro visited” a=0 b=-1 “the Taro visited”

slide-29
SLIDE 29

Lattice and Hypergraph MERT

  • 4
  • 2

2 4

  • 4
  • 2

2 4

Add Second

y=a x+b

Taro the Taro met visited Hanako the Hanako

a=2, b=0 a=1, b=-2 a=-1, b=-1 a=-1, b=1 a=0, b=-2 a=-1, b=0 a=2 b=0 “Taro” a=1 b=-2 “the Taro”

  • 4
  • 2

2 4

  • 4
  • 2

2 4

a=1 b=-1 “Taro met” a=0 b=-3 “the Taro met” a=1 b=1 “Taro visited” a=0 b=-1 “the Taro visited”

X X X X

Delete all lines not in upper envelope

slide-30
SLIDE 30

Lattice and Hypergraph MERT

  • 4
  • 2

2 4

  • 4
  • 2

2 4

Add Second

y=a x+b

Taro the Taro met visited Hanako the Hanako

a=2, b=0 a=1, b=-2 a=-1, b=-1 a=-1, b=1 a=0, b=-2 a=-1, b=0 a=2 b=0 “Taro” a=1 b=-2 “the Taro”

  • 4
  • 2

2 4

  • 4
  • 2

2 4

a=1 b=1 “Taro visited” a=0 b=-1 “the Taro visited”

slide-31
SLIDE 31

Lattice and Hypergraph MERT

Add Third

Taro the Taro met visited Hanako the Hanako

a=2, b=0 a=1, b=-2 a=-1, b=-1 a=-1, b=1 a=0, b=-2 a=-1, b=0

  • 4
  • 2

2 4

  • 4
  • 2

2 4

a=1 b=1 “Taro visited” a=0 b=-1 “the Taro visited”

y=a x+b

a=1 b=-1 “Taro visited the Hanako” a=-1 b=-1 “the Taro visited Hanako” a=0 b=1 “Taro visited Hanako” a=0 b=-3 “the Taro visited the Hanako”

  • 4
  • 2

2 4

  • 4
  • 2

2 4

slide-32
SLIDE 32

Lattice and Hypergraph MERT

Add Third

Taro the Taro met visited Hanako the Hanako

a=2, b=0 a=1, b=-2 a=-1, b=-1 a=-1, b=1 a=0, b=-2 a=-1, b=0

  • 4
  • 2

2 4

  • 4
  • 2

2 4

a=1 b=1 “Taro visited” a=0 b=-1 “the Taro visited”

y=a x+b

a=1 b=-1 “Taro visited the Hanako” a=-1 b=-1 “the Taro visited Hanako” a=0 b=1 “Taro visited Hanako” a=0 b=-3 “the Taro visited the Hanako”

  • 4
  • 2

2 4

  • 4
  • 2

2 4

X X

slide-33
SLIDE 33

Lattice and Hypergraph MERT

Add Third

Taro the Taro met visited Hanako the Hanako

a=2, b=0 a=1, b=-2 a=-1, b=-1 a=-1, b=1 a=0, b=-2 a=-1, b=0

  • 4
  • 2

2 4

  • 4
  • 2

2 4

a=1 b=1 “Taro visited” a=0 b=-1 “the Taro visited”

y=a x+b

a=1 b=-1 “Taro visited the Hanako” a=-1 b=-1 “the Taro visited Hanako” a=0 b=1 “Taro visited Hanako”

  • 4
  • 2

2 4

  • 4
  • 2

2 4

slide-34
SLIDE 34

Lattice and Hypergraph MERT

Improved Stability

Traditional MERT in Green Lattice MERT in Red

slide-35
SLIDE 35

35

Lattice and Hypergraph MERT

Hypergraph MERT

slide-36
SLIDE 36

36

Lattice and Hypergraph MERT

Translation Hypergraph

VP0-5 VP2-5 N2 N0 VP4-5 x1 with x0: 0.56 friend: 0.12 my friend: 0.3 VP0-5 PP0-1 VP2-5 PP2-3 N2 P3 V4 N0 P1 友達 と ご飯 を 食べ た SUF5 VP4-5 x1 x0: 0.6 ate: 0.5 a meal: 0.5 rice: 0.3

slide-37
SLIDE 37

37

Lattice and Hypergraph MERT

  • 4
  • 2

2 4

  • 4
  • 2

2 4

  • 4
  • 2

2 4

  • 4
  • 2

2 4

  • 4
  • 2

2 4

  • 4
  • 2

2 4

Hypergraph MERT

  • Almost exactly the same as lattice MERT

Taro the Taro

a=2, b=0 a=1, b=-2

VP0-5 VP2-5 N0

x1 with x0: a=2 b=0

Lattice MERT Hypergraph MERT

slide-38
SLIDE 38

38

Lattice and Hypergraph MERT

Summary

slide-39
SLIDE 39

39

Lattice and Hypergraph MERT

Summary

  • n-best MERT is unstable because of lack of diversity in

the n-best list

  • This problem can be solved by lattice or hypergraph

MERT

  • Algorithm finds the upper envelope for each sentence

efficiently using dynamic programming