Lattice and Hypergraph MERT Graham Neubig Nara Institute of Science - PowerPoint PPT Presentation

Lattice and Hypergraph MERT Lattice and Hypergraph MERT Graham Neubig Nara Institute of Science and Technology (NAIST) 12/20/2012 1

Lattice and Hypergraph MERT Papers Introduced: ● “Lattice-based Minimum Error Rate Training for Statistical Machine Translation” Wolfgang Macherey, Franz Josef Och, Ignacio Thayer, Jakob Uszkoreit (Google) EMNLP 2008 ● “Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices” Shankar Kumar, Wolfgang Macherey, Chris Dyer, Franz Och (Google/University of Maryland) ACL-IJCNLP 2009 2

Lattice and Hypergraph MERT Summary ● Minimum error rate training (MERT) is used to train the parameters for machine translation ● Normal MERT uses n-best lists ● However, there is not enough diversity in n-best lists, → unstable training & large accuracy fluctuations ● As a solution these papers perform MERT over ● lattices for phrase-based translation [Macherey+ 08] ● hypergraphs for tree-based translation [Kumar+ 09] ● This leads to more stable training in fewer iterations 3

Lattice and Hypergraph MERT Tuning/MERT 4

Lattice and Hypergraph MERT Tuning ● Scores of translation, reordering, and language models LM TM RM -4 -3 -1 -8 ○ Taro visited Hanako ☓ the Taro visited the Hanako -5 -4 -1 -10 ☓ Hanako visited Taro Best -2 -3 -2 -7 Score ☓ ● If we add weights, we can get better answers: Best LM TM RM Score ○ ○ Taro visited Hanako 0.2* 0.3* 0.5* -4 -3 -1 -2.2 ☓ the Taro visited the Hanako -5 -4 -1 -2.7 0.2* 0.3* 0.5* ☓ Hanako visited Taro 0.2* 0.3* 0.5* -2 -3 -2 -2.3 ● Tuning finds these weights: w LM =0.2 w TM =0.3 w RM =0.5

Lattice and Hypergraph MERT MERT ● MERT performs iterations to increase the score [Och 03] n-best (dev) source (dev) the Taro visited the Hanako 太郎が花子を訪問した Decode Hanako visited Taro Taro visited Hanako ... Weights reference (dev) Model Taro visited Hanako Find better weights

Lattice and Hypergraph MERT MERT ● MERT performs iterations to increase the score [Och 03] n-best (dev) source (dev) the Taro visited the Hanako 太郎が花子を訪問した Decode Hanako visited Taro Taro visited Hanako ... Weights reference (dev) Model Taro visited Hanako Find better weights These slides

Lattice and Hypergraph MERT MERT Weight Update: ● Adjust one weight at a time Weights Score w LM w TM w RM Initial: 0.1 0.1 0.1 0.20 Optimize w LM : 0.4 0.1 0.1 0.32 Optimize w TM : 0.4 0.1 0.1 0.32 Optimize w RM : 0.4 0.1 0.3 0.4 Optimize w LM : 0.35 0.1 0.3 0.41 Optimize w TM : 8

Lattice and Hypergraph MERT Updating One Weight: ● We start with: n-best list f 1 φ LM φ TM φ RM BLEU * f 2 φ LM φ TM φ RM BLEU * e 1,1 1 0 -1 0 e 2,1 1 0 -2 0 e 1,2 0 1 0 1 e 2,2 3 0 1 0 e 1,3 1 0 1 0 e 2,3 2 1 2 1 fixed weights: w LM =-1, w TM =1 weight to be adjusted: w RM =??? * Calculating BLEU for one sentence is a bit simplified, usually we compute for the whole corpus

Lattice and Hypergraph MERT Updating One Weight: ● Next, transform each hypothesis into lines: y = a x + b ● Where: ● a is the value of the feature to be adjusted ● b is the weighted sum of the fixed features ● x is the weight to be adjusted (unknown)

Lattice and Hypergraph MERT Updating One Weight: ● Example: y = a x + b w LM =-1, w TM =1, w RM =??? a =ϕ RM b = w LM ϕ LM + w TM ϕ TM f 1 φ LM φ TM φ RM a 1,1 =-1 b 1,1 =-1 e 1,1 1 0 -1 e 1,2 0 1 0 a 1,2 =0 b 1,2 =1 e 1,3 1 0 1 a 1,3 =1 b 1,3 =-1

Lattice and Hypergraph MERT Updating One Weight: ● Draw lines on a graph: y = a x + b f 1 hypotheses f 2 hypotheses BLEU=1 4 BLEU=0 4 e 1,1 e 2,1 e 1,3 e 2,3 2 2 e 1,2 e 2,2 0 0 -4 -2 0 2 4 -4 -2 0 2 4 -2 -2 -4 -4 a 1,1 =-1 b 1,1 =-1 a 2,1 =-2 b 2,1 =-1 a 1,2 =0 b 1,2 =1 a 2,2 =1 b 2,2 =-3 a 1,3 =1 b 1,3 =-1 a 2,3 =-2 b 2,3 =1

Lattice and Hypergraph MERT Updating One Weight: ● Find the lines that are highest for each range of x : f 1 hypotheses f 2 hypotheses e 1,3 4 4 e 1,1 e 2,1 e 2,3 2 2 e 1,2 e 2,2 0 0 -4 -2 0 2 4 -4 -2 0 2 4 -2 -2 -4 -4 ● This is called the convex hull (or upper envelope)

Lattice and Hypergraph MERT Updating One Weight: ● Using the convex hull, find scores at each range: f 1 f 2 4 4 e 1,3 e 1,1 e 2,1 e 2,3 2 2 e 1,2 e 2,2 0 0 -4 -2 0 2 4 -4 -2 0 2 4 -2 -2 -4 -4 1 1 BLEU 0 0 -4 -2 0 2 4 -4 -2 0 2 4

Lattice and Hypergraph MERT Updating One Weight: ● Combine multiple sentences into a single error plane: f 1 f 2 1 1 BLEU 0 0 -4 -2 0 2 4 -4 -2 0 2 4 total accuracy 1 BLEU 0 -4 -2 0 2 4

Lattice and Hypergraph MERT Updating One Weight: ● Choose middle of best region: total accuracy 1 BLEU 0 -4 -2 0 2 4 w RM ←1.0

Lattice and Hypergraph MERT Summary ● For each sentence: ● Create lines for each n-best hypothesis ● Combine lines and find upper envelope ● Transform upper envelope into error surface ● Combine error surfaces into one ● Find the range with the highest score ● Set the weight to the middle of the range

Lattice and Hypergraph MERT Summary ● For each sentence: Problem! (not enough diversity) ● Create lines for each n-best hypothesis ● Combine lines and find upper envelope ● Transform upper envelope into error surface ● Combine error surfaces into one ● Find the range with the highest score ● Set the weight to the middle of the range

Lattice and Hypergraph MERT Result of Lack of Diversity ● Unstable training: Traditional MERT in Green [Macherey 08]

Lattice and Hypergraph MERT Lattice MERT 20

Lattice and Hypergraph MERT Translation Lattice ● Represent many hypotheses compactly: Taro met Hanako the Taro visited the Hanako 8 hypotheses in only 6 edges ● MERT on lattices can solve the diversity problem

Lattice and Hypergraph MERT Factoring Feature Functions ● Each edge in the lattice has a feature value: φ LM =1, φ TM =1, φ RM =2 φ LM =0, φ TM =-1, φ RM =-1 φ LM =1, φ TM =1, φ RM =-1 Taro met Hanako the Taro visited the Hanako φ LM =2, φ TM =0, φ RM =1 φ LM =-2, φ TM =-1, φ RM =-1 φ LM =2, φ TM =0, φ RM =0 ● Hypothesis's features are sum of edge features: φ LM =1, φ TM =1, φ RM =2 φ LM =0, φ TM =-1, φ RM =-1 φ LM =1, φ TM =1, φ RM =-1 Taro met Hanako φ LM =2, φ TM =1, φ RM =0

Lattice and Hypergraph MERT MERT on Lattices: ● For each sentence: Only different part!! ● Transform each edge into lines ● Find the upper envelope for the lattice ● Transform upper envelope into error surface ● Combine error surfaces into one ● Find the range with the highest score ● Set the weight to the middle of the range

Lattice and Hypergraph MERT First, Transform each Edge into Lines φ LM =1, φ TM =1, φ RM =2 φ LM =0, φ TM =-1, φ RM =-1 φ LM =1, φ TM =1, φ RM =-1 Taro met Hanako the Taro visited the Hanako φ LM =2, φ TM =0, φ RM =1 φ LM =-2, φ TM =-1, φ RM =-1 φ LM =2, φ TM =0, φ RM =0 y = a x + b w LM =-1, w TM =1, w RM =??? a =ϕ RM b = w LM ϕ LM + w TM ϕ TM a=2, b=0 a=-1, b=-1 a=-1, b=0 Taro met Hanako the Taro visited the Hanako a=-1, b=1 a=0, b=-2 a=1, b=-2

Lattice and Hypergraph MERT Finding the Upper Envelope for Lattices: ● Can be done with dynamic programming ● 1) Start with flat envelope for initial node ● 2) Calculate upper envelope for next node using previous nodes

Lattice and Hypergraph MERT Start with Flat Envelope a=2, b=0 a=-1, b=-1 a=-1, b=0 Taro met Hanako the Taro visited the Hanako a=-1, b=1 a=0, b=-2 a=1, b=-2 4 2 0 a=0 b=0 “” -4 -2 0 2 4 -2 -4 y = a x + b

Lattice and Hypergraph MERT Add First Node a=2, b=0 a=-1, b=-1 a=-1, b=0 Taro met Hanako the Taro visited the Hanako a=-1, b=1 a=0, b=-2 a=1, b=-2 4 4 2 2 a=2 b=0 “Taro” 0 0 a=0 b=0 “” -4 -2 0 2 4 -4 -2 0 2 4 -2 -2 -4 -4 a=1 b=-2 “the Taro” y = a x + b

Lattice and Hypergraph MERT Add Second a=2, b=0 a=-1, b=-1 a=-1, b=0 Taro met Hanako the Taro visited the Hanako a=-1, b=1 a=0, b=-2 a=1, b=-2 4 a=1 b=1 “Taro visited” 4 a=1 b=-1 “Taro met” 2 2 a=2 b=0 “Taro” 0 0 -4 -2 0 2 4 a=1 b=-2 “the Taro” -2 a=0 b=-1 “the Taro visited” -4 -2 0 2 4 -2 -4 a=0 b=-3 “the Taro met” -4 y = a x + b

Lattice and Hypergraph MERT Add Second a=2, b=0 a=-1, b=-1 a=-1, b=0 Taro met Hanako the Taro visited the Hanako a=-1, b=1 a=0, b=-2 a=1, b=-2 4 a=1 b=1 “Taro visited” 4 X a=1 b=-1 “Taro met” 2 X 2 a=2 b=0 “Taro” 0 0 -4 -2 0 2 4 a=1 b=-2 “the Taro” -2 a=0 b=-1 “the Taro visited” -4 -2 0 2 4 -2 X X -4 a=0 b=-3 “the Taro met” -4 y = a x + b Delete all lines not in upper envelope

Lattice and Hypergraph MERT Add Second a=2, b=0 a=-1, b=-1 a=-1, b=0 Taro met Hanako the Taro visited the Hanako a=-1, b=1 a=0, b=-2 a=1, b=-2 4 a=1 b=1 “Taro visited” 4 2 2 a=2 b=0 “Taro” 0 0 -4 -2 0 2 4 a=1 b=-2 “the Taro” -2 a=0 b=-1 “the Taro visited” -4 -2 0 2 4 -2 -4 -4 y = a x + b

Lattice and Hypergraph MERT Graham Neubig Nara Institute of Science - PowerPoint PPT Presentation

Lattice and Hypergraph MERT Lattice and Hypergraph MERT Graham Neubig Nara Institute of Science and Technology (NAIST) 12/20/2012 1 Lattice and Hypergraph MERT Papers Introduced: Lattice-based Minimum Error Rate Training for

Ascension Parish Assessor M.J. Mert Smiley, Jr. Assessor M.J. Mert Smiley, Jr.

Hypergraph Decompositions and Toric Ideals Elizabeth Gross and Kaie Kubjas June 9, 2015 Toric

Improved Minimum Error Rate Training in Moses Nicola Bertoldi, Barry Haddow and Jean-Baptiste Fouet

New hardness results for graph and hypergraph colorings Joshua Brakensiek , Venkatesan Guruswami

Property testing and hypergraph regularity lemmas Mathias Schacht Institut f ur Informatik

Hom complexes and hypergraph colorings Daisuke Kishimoto Department of Mathematics Kyoto

- pm characteristic of field 0 / O w H ( a t b Y ' = at t b t Hypergraph . , E ) vertex ( V

Investigating hypergraph-partitioning-based sparse matrix partitioning methods Bora U car

Big Ramsey degrees of the 3-uniform hypergraph Jan Hubi cka Computer Science Institute of

Algebraic Study of Lattice-Valued Logic and Lattice-Valued Modal Logic Yoshihiro Maruyama

When is the lattice of closure operators on a subgroup lattice again a subgroup lattice? Martha

Energy Depositions For Lattices 1 and 2 Lattice 1 Lattice 2 Two scenarios FODO bend FODO

Lattice Points in Polytopes Richard P. Stanley U. Miami & M.I.T. A lattice polygon Georg

Lattice gas simulations Tony Kim Spring 2007 18.354 Project 1) Introducing the lattice gas;

Review on Lattice Muon g-2 HVP Calculation Kohtaroh Miura (GSI Helmholtz-Instute Mainz) Lattice

Lattice QCD Outline 1. Lattice QCD (why and what) 2. Precision flavour physics 3. (g-2) on

improve local health and care services Public meeting Thursday 10 October Why were working

ENGLISH DOMINANCE AND AMERICA-CENTRICITY IN ARGENTINEAN TUMBLR USERS TAGS Mary-Caitlyn

Results from the CDMS Experiment Jodi Cooley Stanford University CDMS Analysis Coordinator

8 Th sources Neutrons from 2 2 (commercial and custom) E.Bellotti, C.M. Cattadori, A. di Vacri,

Non-simply-laced quiver gauge theory from -background Taro Kimura Keio

What is the next feature? Who makes next feature of Vim :echo $USER Name: Yasuhiro Matsumoto

Indeterminate valency & verbal ambivalence in Chitimacha Daniel W. Hieber University of

Logical Relations for a Manifest Contract Calculus Taro Sekiyama Atsushi Igarashi Kyoto

Lattice and Hypergraph MERT Graham Neubig Nara Institute of Science - PowerPoint PPT Presentation

Lattice and Hypergraph MERT Lattice and Hypergraph MERT Graham Neubig Nara Institute of Science and Technology (NAIST) 12/20/2012 1 Lattice and Hypergraph MERT Papers Introduced: Lattice-based Minimum Error Rate Training for

Ascension Parish Assessor M.J. Mert Smiley, Jr. Assessor M.J. Mert Smiley, Jr.

Hypergraph Decompositions and Toric Ideals Elizabeth Gross and Kaie Kubjas June 9, 2015 Toric

Improved Minimum Error Rate Training in Moses Nicola Bertoldi, Barry Haddow and Jean-Baptiste Fouet

New hardness results for graph and hypergraph colorings Joshua Brakensiek , Venkatesan Guruswami

Property testing and hypergraph regularity lemmas Mathias Schacht Institut f ur Informatik

Hom complexes and hypergraph colorings Daisuke Kishimoto Department of Mathematics Kyoto

- pm characteristic of field 0 / O w H ( a t b Y ' = at t b t Hypergraph . , E ) vertex ( V

Investigating hypergraph-partitioning-based sparse matrix partitioning methods Bora U car

Big Ramsey degrees of the 3-uniform hypergraph Jan Hubi cka Computer Science Institute of

Algebraic Study of Lattice-Valued Logic and Lattice-Valued Modal Logic Yoshihiro Maruyama

When is the lattice of closure operators on a subgroup lattice again a subgroup lattice? Martha

Energy Depositions For Lattices 1 and 2 Lattice 1 Lattice 2 Two scenarios FODO bend FODO

Lattice Points in Polytopes Richard P. Stanley U. Miami &amp; M.I.T. A lattice polygon Georg

Lattice gas simulations Tony Kim Spring 2007 18.354 Project 1) Introducing the lattice gas;

Review on Lattice Muon g-2 HVP Calculation Kohtaroh Miura (GSI Helmholtz-Instute Mainz) Lattice

Lattice QCD Outline 1. Lattice QCD (why and what) 2. Precision flavour physics 3. (g-2) on

improve local health and care services Public meeting Thursday 10 October Why were working

ENGLISH DOMINANCE AND AMERICA-CENTRICITY IN ARGENTINEAN TUMBLR USERS TAGS Mary-Caitlyn

Results from the CDMS Experiment Jodi Cooley Stanford University CDMS Analysis Coordinator

8 Th sources Neutrons from 2 2 (commercial and custom) E.Bellotti, C.M. Cattadori, A. di Vacri,

Non-simply-laced quiver gauge theory from -background Taro Kimura Keio

What is the next feature? Who makes next feature of Vim :echo $USER Name: Yasuhiro Matsumoto

Indeterminate valency &amp; verbal ambivalence in Chitimacha Daniel W. Hieber University of

Logical Relations for a Manifest Contract Calculus Taro Sekiyama Atsushi Igarashi Kyoto

Lattice Points in Polytopes Richard P. Stanley U. Miami & M.I.T. A lattice polygon Georg

Indeterminate valency & verbal ambivalence in Chitimacha Daniel W. Hieber University of