Learning(to(Translate(with( Mul2ple(Objec2ves - - PowerPoint PPT Presentation

learning to translate with mul2ple objec2ves
SMART_READER_LITE
LIVE PREVIEW

Learning(to(Translate(with( Mul2ple(Objec2ves - - PowerPoint PPT Presentation

Learning(to(Translate(with( Mul2ple(Objec2ves Kevin&Duh&(NAIST)& Katsuhito&Sudoh&(NTT)& Xianchao&Wu&(Baidu)& Hajime&Tsukada&(NTT)& Masaaki&Nagata&(NTT)& &


slide-1
SLIDE 1

Learning(to(Translate(with( Mul2ple(Objec2ves

Kevin&Duh&(NAIST)& Katsuhito&Sudoh&(NTT)& Xianchao&Wu&(Baidu)& Hajime&Tsukada&(NTT)& Masaaki&Nagata&(NTT)&

slide-2
SLIDE 2

& How&many&metrics&have&been& proposed&for&MT&evaluaGon?

slide-3
SLIDE 3

BLEU TER RIBES METEOR WER PER SemPos DepOverlap ParaEval GTM RTE NIST IMPACT RED NCT SEPIA TESLA

slide-4
SLIDE 4

& How&many&metrics&are&used&for& MT&opGmizaGon?

slide-5
SLIDE 5

BLEU

slide-6
SLIDE 6

BLEU TER RIBES METEOR WER PER SemPos DepOverlap ParaEval GTM RTE NIST IMPACT RED NCT SEPIA TESLA BLEU

Metrics&for&EvaluaGon

for&OpGmizaGon

slide-7
SLIDE 7

& Each&metric&has&its&strengths.&

!&OpGmize&with&mulGple&metrics

slide-8
SLIDE 8

Outline

  • 1. MoGvaGon&
  • 2. Basic&Concepts:&Pareto&opGmality&
  • 3. MulGobjecGve&opGmizaGon&in&MT&
  • 4. Experiments
slide-9
SLIDE 9

Outline

  • 1. MoGvaGon&
  • 2. Basic&Concepts:&Pareto&opGmality&
  • 3. MulGobjecGve&opGmizaGon&in&MT&
  • 4. Experiments
slide-10
SLIDE 10

MulGobjecGve&opGmizaGon

  • max

w [F 1(w), F 2(w),..., F K(w)]

Find&one&w&that&simultaneously&opGmizes& K&objecGves& But&what&does&it&mean&to&be&“opGmum”?&

slide-11
SLIDE 11

Hotel Customer( Reviews Distance(to( Conference(Center Price( (KRW) The(Shilla(Jeju 4.5&stars 5&minutes 230,000 Hotel(LoMe(Jeju

  • 4.5&stars

5&minutes 200,000 Poonglim(Resort 3&stars 10&minutes 120,000&

  • Hana(Hotel

3&stars 5&minutes 120,000&

  • Gyulhyanngi(Pension

2&stars 10&minutes &&90,000& &

MulGobjecGve&opGmizaGon&&

  • f&your&ACL&Hotel

Vilfredo&Pareto,&& Economist&(1848_1923) You’re&irraGonal!& That&choice&is&not& Pareto&OpGmal!

slide-12
SLIDE 12

How&to&define&opGmality

ObjecGve&1 ObjecGve&2 0.1&&&&&&&&0.2&&&&&&&&0.3&&&&&&&&&0.4&&&&&&&0.5 0.1&&&&&&&&0.2&&&&&&&&0.3&&&&&&&&&0.4&&&&&&&0.5 D A B C E F G

slide-13
SLIDE 13

ObjecGve&1 ObjecGve&2 0.1&&&&&&&&0.2&&&&&&&&0.3&&&&&&&&&0.4&&&&&&&0.5 0.1&&&&&&&&0.2&&&&&&&&0.3&&&&&&&&&0.4&&&&&&&0.5 D A B C E F G

  • A&point&p&is&weakly(paretoPop2mal&iff&there&does&not&

exist&another&point&q&such&that&Fk(q)&>&Fk(p)&for&all&k

slide-14
SLIDE 14

ObjecGve&1 ObjecGve&2 0.1&&&&&&&&0.2&&&&&&&&0.3&&&&&&&&&0.4&&&&&&&0.5 0.1&&&&&&&&0.2&&&&&&&&0.3&&&&&&&&&0.4&&&&&&&0.5 D A B C E F G

  • A&point&p&is&paretoPop2mal&iff&there&does&not&exist&a&q&such&

that&Fk(q)&>=&Fk(p)&for&all&k&and&Fk(q)&>&Fk(p)&for&at&least&one&k&

  • Pareto&&&Weakly_Pareto&

Weakly_Pareto&

slide-15
SLIDE 15

ObjecGve&1 ObjecGve&2 0.1&&&&&&&&0.2&&&&&&&&0.3&&&&&&&&&0.4&&&&&&&0.5 0.1&&&&&&&&0.2&&&&&&&&0.3&&&&&&&&&0.4&&&&&&&0.5 D A B C E F G

Given&a&set&of&points,&the&subset&of&pareto_

  • pGmal&points&form&the&Pareto(Fron2er
slide-16
SLIDE 16

Outline

  • 1. MoGvaGon&
  • 2. Basic&Concepts:&Pareto&opGmality&
  • 3. MulGobjecGve&opGmizaGon&in&MT&
  • 4. Experiments
slide-17
SLIDE 17

OpGmizaGon&in&Machine&TranslaGon

  • Op2miza2on(

NPbest

Decode(

Weights(

Sentence(from( Development(Set Reference(&(( Evalua2on(Metrics

slide-18
SLIDE 18

Baseline:(( Linear(Combina2on

Advantages:&

  • 1. Single_objecGve&tools&can&be&used&
  • 2. Sufficiency:&If&w*&is&a&soluGon,&then&it’s&Weakly&Pareto&

Importance&of&each&objecGve&&

  • Disadvantages:&
  • 1. How&to&set&α?&
  • 2. No&Necessary&CondiGons:&Some&Pareto&points&can&

never&been&obtained,&whatever&semng&of&α.

max

w

αkF

k(w) k=1 K

αk ≥ 0, αk

k=1 K

=1

slide-19
SLIDE 19

Pareto&points&not&on&Convex&Hull&are&missed

ObjecGve&1 ObjecGve&2 0.1&&&&&&&&0.2&&&&&&&&0.3&&&&&&&&&0.4&&&&&&&0.5 0.1&&&&&&&&0.2&&&&&&&&0.3&&&&&&&&&0.4&&&&&&&0.5 D A B C E F G

  • 0 ≤α1 ≤ 0.5

0.5 ≤α1 ≤1 α1 =1

slide-20
SLIDE 20
  • New&method:&Directly&opGmize&Pareto&Front
slide-21
SLIDE 21

New&method:&Directly&opGmize&Pareto&Front

ObjecGve&1 ObjecGve&2 0.1&&&&&&&&0.2&&&&&&&&0.3&&&&&&&&&0.4&&&&&&&0.5 0.1&&&&&&&&0.2&&&&&&&&0.3&&&&&&&&&0.4&&&&&&&0.5 D A B C E F G

  • Step&1:&&

Compute&Pareto&FronGer&

  • n&N_best&List&

Complexity&O(#objecGve&*&N^2)

Step&2:&&& Find&w&separaGng&& Pareto&vs.&Non_Pareto

slide-22
SLIDE 22

MulG_objecGve&Pairwise&Ranking&OpGmizaGon&

  • min

w || w ||2 +c

ξij

ij

s.t. wTΦ(x, yi)− wTΦ(x, yj) ≥1−ζij ∀yi ∈ ParetoFront, yj ∉ ParetoFront

Regularizer Slack Feature&vector Input&sentence Good&hypothesis Poor&hypothesis

i.e.&score&of&pareto&hypothesis&should&be&higher&than&non_pareto&hypotheses

slide-23
SLIDE 23

Outline

  • 1. MoGvaGon&
  • 2. Basic&Concepts:&Pareto&opGmality&
  • 3. MulGobjecGve&opGmizaGon&in&MT&
  • 4. Experiments
slide-24
SLIDE 24

Experiment&Setup

  • Compare&Linear&CombinaGon&vs.&Pareto&

– Both&use&pairwise&rank&opGmizaGon,&but&different&objecGve.& – For&Linear&CombinaGon,&mulGple&α&semngs&(α1&=&{1,0.7,0.5,0.3,0})& – 5&runs,&20&iteraGons&each.&Collect/visualize&set&of&soluGons.

  • Task(1:(NIST(ZhPEn(

( Op2mize(BLEU(&(NTER( NTER(=(max(1PTER,0)( (

Moses(decoder,(7M(train(sentences,( 1.6k(dev,(8(features

Task(2:(PubMed(EnPJa(

( Op2mize(BLEU(&(RIBES( RIBES(=(permuta2on(metric([Isozaki,(EMNLP10]( (

Moses(decoder,(0.2M(train(sentences,(2k(dev,(14( features

slide-25
SLIDE 25

Result&VisualizaGon

  • (α1=1,&α2=0)&

(α1=0.5,&α2=0.5)&

slide-26
SLIDE 26
  • PubMed(Result

NIST(Result

  • 1. Pareto&>&Linear&CombinaGon&

for&any&α&

OBSERVARTIONS:

slide-27
SLIDE 27
  • 1. Pareto&>&Linear&CombinaGon&

for&any&α&

  • 2. Metric&tunability:&Pareto&
  • utperform&single_objecGve&
  • pGmizaGon&of&RIBES

PubMed(Result NIST(Result OBSERVARTIONS:

slide-28
SLIDE 28

Analysis:&Number&of&Pareto&Points

slide-29
SLIDE 29

Analysis:&Metric&Tunability

  • RIBES

BLEU Sampling(of(10k(random(w’s

slide-30
SLIDE 30

& Summary&&&Final&Thoughts&

slide-31
SLIDE 31

BLEU TER RIBES METEOR WER PER SemPos DepOverlap ParaEval GTM RTE NIST IMPACT RED NCT SEPIA TESLA BLEU

Metrics&for&EvaluaGon

for&OpGmizaGon

slide-32
SLIDE 32

BLEU TER RIBES METEOR WER PER SemPos DepOverlap ParaEval GTM RTE NIST IMPACT RED NCT SEPIA TESLA

Metrics(for(Evalua2on(and(Op2miza2on

slide-33
SLIDE 33

Vilfredo&Pareto&(1848_1923)

slide-34
SLIDE 34

MulG_objecGve&problems&are& everywhere&if&we&look

  • Speed&&&Accuracy&

– Parsing&[Eisner2011]&&

  • Intrinsic&&&Extrinsic&Metrics&

– Parser&&&downstream&Machine&TranslaGon&[Hall2011]&&

  • MulGple&datasets&

– RecommendaGon&system&[Agarawl2011]&&

  • Escape&local&opGma&

– Hard&&&Sow&EM&in&grammar&inducGon&& [Spitkovsky2011]&&

slide-35
SLIDE 35

Thanks&for&your&axenGon!

Do&you&have&& a&mulG_objecGve&problem?&

slide-36
SLIDE 36

NIST&Result

slide-37
SLIDE 37

PubMed&Result