Translation Model Adaptation Using Genre-Revealing Text Features - - PowerPoint PPT Presentation

translation model adaptation using genre revealing text
SMART_READER_LITE
LIVE PREVIEW

Translation Model Adaptation Using Genre-Revealing Text Features - - PowerPoint PPT Presentation

Translation Model Adaptation Using Genre-Revealing Text Features Marlies van der Wees, Arianna Bisazza, Christof Monz Domain adaptation for SMT Prioritize translation candidates that are most relevant to a specific task Translation Model


slide-1
SLIDE 1

Translation Model Adaptation Using Genre-Revealing Text Features

Marlies van der Wees, Arianna Bisazza, Christof Monz

slide-2
SLIDE 2

Translation Model Adaptation Using Genre-Revealing Text Features 2

Domain adaptation for SMT

✤ Prioritize translation candidates that are most

relevant to a specific task

slide-3
SLIDE 3

Translation Model Adaptation Using Genre-Revealing Text Features 2

Domain adaptation for SMT

✤ Prioritize translation candidates that are most

relevant to a specific task

Heterogeneous training data Specific translation task

slide-4
SLIDE 4

Translation Model Adaptation Using Genre-Revealing Text Features 2

Domain adaptation for SMT

✤ Prioritize translation candidates that are most

relevant to a specific task

Heterogeneous training data Specific translation task

slide-5
SLIDE 5

Translation Model Adaptation Using Genre-Revealing Text Features

!ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 …

2

Domain adaptation for SMT

✤ Prioritize translation candidates that are most

relevant to a specific task

Heterogeneous training data Specific translation task

slide-6
SLIDE 6

Translation Model Adaptation Using Genre-Revealing Text Features

!ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 …

2

Domain adaptation for SMT

✤ Prioritize translation candidates that are most

relevant to a specific task

Heterogeneous training data Specific translation task

!ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 …

slide-7
SLIDE 7

Translation Model Adaptation Using Genre-Revealing Text Features

!ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 …

2

Domain adaptation for SMT

✤ Prioritize translation candidates that are most

relevant to a specific task

Heterogeneous training data Specific translation task

✤ What type of domain information to use?

!ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 …

slide-8
SLIDE 8

Translation Model Adaptation Using Genre-Revealing Text Features 3

Dimensions of domains

✤ Topic refers to general subject ✦

politics, sports, tennis

slide-9
SLIDE 9

Translation Model Adaptation Using Genre-Revealing Text Features 3

Dimensions of domains

✤ Topic refers to general subject ✦

politics, sports, tennis

✤ Genre refers to function, style, text type ✦

editorials, newswire, user-generated text

  • rthogonal to topic
slide-10
SLIDE 10

Translation Model Adaptation Using Genre-Revealing Text Features 3

Dimensions of domains

✤ Topic refers to general subject ✦

politics, sports, tennis

✤ Genre refers to function, style, text type ✦

editorials, newswire, user-generated text

  • rthogonal to topic

✤ Provenance refers to document’s origin ✦

LDC2005T13, Europarl, EMEA

slide-11
SLIDE 11

Translation Model Adaptation Using Genre-Revealing Text Features 4

The problem with provenance

Provenance information has proven useful for adaptation in SMT, but is it the best representation

  • f a domain?
slide-12
SLIDE 12

Translation Model Adaptation Using Genre-Revealing Text Features 4

The problem with provenance

✤ It’s not an intrinsic text property

Provenance information has proven useful for adaptation in SMT, but is it the best representation

  • f a domain?
slide-13
SLIDE 13

Translation Model Adaptation Using Genre-Revealing Text Features 4

The problem with provenance

✤ It’s not an intrinsic text property ✤ We might need manual labeling ✦

labor-intensive

arbitrary Provenance information has proven useful for adaptation in SMT, but is it the best representation

  • f a domain?
slide-14
SLIDE 14

Translation Model Adaptation Using Genre-Revealing Text Features 4

The problem with provenance

✤ It’s not an intrinsic text property ✤ We might need manual labeling ✦

labor-intensive

arbitrary

✤ Often combines particular topic and genre

Provenance information has proven useful for adaptation in SMT, but is it the best representation

  • f a domain?
slide-15
SLIDE 15

Translation Model Adaptation Using Genre-Revealing Text Features 5

Disentangling topic and genre in SMT*

✤ Experiments on

controlled test set: Gen&Topic

* Van der Wees et al., 2015

News Comment Culture Economy

The 12 contestants competed during a May 3rd Prime. You allowed Barwas to represent Iraq while she sings in Kurdish!!! What development in Yemen are you talking about? Yemen is mulling the establishment of 13 industrial zones.

slide-16
SLIDE 16

Translation Model Adaptation Using Genre-Revealing Text Features 5

Disentangling topic and genre in SMT*

✤ Experiments on

controlled test set: Gen&Topic

✤ Genre has larger

impact on SMT than topic

* Van der Wees et al., 2015

News Comment Culture Economy

The 12 contestants competed during a May 3rd Prime. You allowed Barwas to represent Iraq while she sings in Kurdish!!! What development in Yemen are you talking about? Yemen is mulling the establishment of 13 industrial zones.

slide-17
SLIDE 17

Translation Model Adaptation Using Genre-Revealing Text Features 5

Disentangling topic and genre in SMT*

✤ Experiments on

controlled test set: Gen&Topic

✤ Genre has larger

impact on SMT than topic

✤ We want to adapt

to different genres in a test corpus!

* Van der Wees et al., 2015

News Comment Culture Economy

The 12 contestants competed during a May 3rd Prime. You allowed Barwas to represent Iraq while she sings in Kurdish!!! What development in Yemen are you talking about? Yemen is mulling the establishment of 13 industrial zones.

slide-18
SLIDE 18

Translation Model Adaptation Using Genre-Revealing Text Features 6

What information to use for adaptation?

✤ Provenance information ✦

manual grouping of sub-corpora

slide-19
SLIDE 19

Translation Model Adaptation Using Genre-Revealing Text Features 6

What information to use for adaptation?

✤ Provenance information ✦

manual grouping of sub-corpora

✤ Topic information ✦

unsupervised LDA-inferred topics

slide-20
SLIDE 20

Translation Model Adaptation Using Genre-Revealing Text Features 6

What information to use for adaptation?

✤ Provenance information ✦

manual grouping of sub-corpora

✤ Topic information ✦

unsupervised LDA-inferred topics

✤ Genre information ✦

determine and exploit intrinsic genre-revealing text features

slide-21
SLIDE 21

Translation Model Adaptation Using Genre-Revealing Text Features 6

What information to use for adaptation?

✤ Provenance information ✦

manual grouping of sub-corpora

✤ Topic information ✦

unsupervised LDA-inferred topics

✤ Genre information ✦

determine and exploit intrinsic genre-revealing text features

slide-22
SLIDE 22

Translation Model Adaptation Using Genre-Revealing Text Features 6

What information to use for adaptation?

✤ Provenance information ✦

manual grouping of sub-corpora

✤ Topic information ✦

unsupervised LDA-inferred topics

✤ Genre information ✦

determine and exploit intrinsic genre-revealing text features

slide-23
SLIDE 23

Translation Model Adaptation Using Genre-Revealing Text Features 6

What information to use for adaptation?

✤ Provenance information ✦

manual grouping of sub-corpora

✤ Topic information ✦

unsupervised LDA-inferred topics

✤ Genre information ✦

determine and exploit intrinsic genre-revealing text features

slide-24
SLIDE 24

Translation Model Adaptation Using Genre-Revealing Text Features 7

Genre adaptation: the task

✤ Arabic-English phrase-based SMT

* ilps.science.uva.nl/resources/gen-topic/

slide-25
SLIDE 25

Translation Model Adaptation Using Genre-Revealing Text Features 7

Genre adaptation: the task

✤ Arabic-English phrase-based SMT

* ilps.science.uva.nl/resources/gen-topic/

✤ Two multi-genre evaluation sets: ✦

Gen&Topic*:

  • newswire (NW)
  • comments (UG)
slide-26
SLIDE 26

Translation Model Adaptation Using Genre-Revealing Text Features 7

Genre adaptation: the task

✤ Arabic-English phrase-based SMT

* ilps.science.uva.nl/resources/gen-topic/

NIST:

  • newswire (NW)
  • weblogs (UG)

✤ Two multi-genre evaluation sets: ✦

Gen&Topic*:

  • newswire (NW)
  • comments (UG)
slide-27
SLIDE 27

Translation Model Adaptation Using Genre-Revealing Text Features 7

Genre adaptation: the task

✤ Arabic-English phrase-based SMT

* ilps.science.uva.nl/resources/gen-topic/

NIST:

  • newswire (NW)
  • weblogs (UG)

✤ Two multi-genre evaluation sets: ✦

Gen&Topic*:

  • newswire (NW)
  • comments (UG)

✤ Translation model adaptation

slide-28
SLIDE 28

Translation Model Adaptation Using Genre-Revealing Text Features 8

Genre adaptation: general framework

✤ Vector space model (VSM) for translation model

adaptation*

* Following Chen et al., 2013

slide-29
SLIDE 29

Translation Model Adaptation Using Genre-Revealing Text Features

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N 8

Genre adaptation: general framework

✤ Vector space model (VSM) for translation model

adaptation*

* Following Chen et al., 2013

slide-30
SLIDE 30

Translation Model Adaptation Using Genre-Revealing Text Features

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N 8

Genre adaptation: general framework

✤ Vector space model (VSM) for translation model

adaptation*

* Following Chen et al., 2013

slide-31
SLIDE 31

Translation Model Adaptation Using Genre-Revealing Text Features

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N 8

Genre adaptation: general framework

✤ Vector space model (VSM) for translation model

adaptation*

* Following Chen et al., 2013

slide-32
SLIDE 32

Translation Model Adaptation Using Genre-Revealing Text Features

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N 8

Genre adaptation: general framework

✤ Vector space model (VSM) for translation model

adaptation*

* Following Chen et al., 2013

slide-33
SLIDE 33

Translation Model Adaptation Using Genre-Revealing Text Features

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N 8

Genre adaptation: general framework

✤ Vector space model (VSM) for translation model

adaptation*

* Following Chen et al., 2013

slide-34
SLIDE 34

Translation Model Adaptation Using Genre-Revealing Text Features

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N 8

Genre adaptation: general framework

✤ Vector space model (VSM) for translation model

adaptation*

* Following Chen et al., 2013

slide-35
SLIDE 35

Translation Model Adaptation Using Genre-Revealing Text Features

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 0.3 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N 8

Genre adaptation: general framework

✤ Vector space model (VSM) for translation model

adaptation*

* Following Chen et al., 2013

slide-36
SLIDE 36

Translation Model Adaptation Using Genre-Revealing Text Features

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 0.3 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 0.3 0.4 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N 8

Genre adaptation: general framework

✤ Vector space model (VSM) for translation model

adaptation*

* Following Chen et al., 2013

slide-37
SLIDE 37

Translation Model Adaptation Using Genre-Revealing Text Features

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 0.3 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 0.3 0.4 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N 8

Genre adaptation: general framework

✤ Vector space model (VSM) for translation model

adaptation*

* Following Chen et al., 2013 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 0.3 0.4 0.1 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

slide-38
SLIDE 38

Translation Model Adaptation Using Genre-Revealing Text Features

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 0.3 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 0.3 0.4 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N 8

Genre adaptation: general framework

✤ Vector space model (VSM) for translation model

adaptation*

* Following Chen et al., 2013 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 0.3 0.4 0.1 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

!ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >

1 N

similarity score 0.1 Vector for development set: < w (dev) … w (dev) >

1 N

0.2 0.4 0.3 0.4 0.1 < w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

< w … w >

1 N

slide-39
SLIDE 39

Translation Model Adaptation Using Genre-Revealing Text Features 9

How to construct genre-informed vectors?

✤ Original version: provenance information ✦

following Chen et al., 2013

slide-40
SLIDE 40

Translation Model Adaptation Using Genre-Revealing Text Features 9

How to construct genre-informed vectors?

✤ Original version: provenance information ✦

following Chen et al., 2013

✤ Our version: intrinsic genre information

slide-41
SLIDE 41

Translation Model Adaptation Using Genre-Revealing Text Features 9

How to construct genre-informed vectors?

✤ Original version: provenance information ✦

following Chen et al., 2013

✤ Our version: intrinsic genre information ✦

document-level genre features borrowed from text classification literature

slide-42
SLIDE 42

Translation Model Adaptation Using Genre-Revealing Text Features 9

How to construct genre-informed vectors?

✤ Original version: provenance information ✦

following Chen et al., 2013

✤ Our version: intrinsic genre information ✦

document-level genre features borrowed from text classification literature

directly observable in raw text

slide-43
SLIDE 43

Translation Model Adaptation Using Genre-Revealing Text Features 9

How to construct genre-informed vectors?

✤ Original version: provenance information ✦

following Chen et al., 2013

✤ Our version: intrinsic genre information ✦

document-level genre features borrowed from text classification literature

directly observable in raw text

we also test: to what extent can LDA-inferred ‘topics’ distinguish our genres?

slide-44
SLIDE 44

Translation Model Adaptation Using Genre-Revealing Text Features 10

Genre adaptation: genre-revealing features

slide-45
SLIDE 45

Translation Model Adaptation Using Genre-Revealing Text Features 10

Genre adaptation: genre-revealing features

✤ Seven most discriminative features between NW and

UG are used in final VSM version

slide-46
SLIDE 46

Translation Model Adaptation Using Genre-Revealing Text Features 11

Genre adaptation: three hypotheses

The proposed genre-revealing features…

slide-47
SLIDE 47

Translation Model Adaptation Using Genre-Revealing Text Features 11

Genre adaptation: three hypotheses

The proposed genre-revealing features…

  • 1. enhance translation performance for NW and UG

measured in BLEU

slide-48
SLIDE 48

Translation Model Adaptation Using Genre-Revealing Text Features 11

Genre adaptation: three hypotheses

The proposed genre-revealing features…

  • 1. enhance translation performance for NW and UG

measured in BLEU

  • 2. can be projected across languages

values computed for Arabic and English

slide-49
SLIDE 49

Translation Model Adaptation Using Genre-Revealing Text Features 11

Genre adaptation: three hypotheses

The proposed genre-revealing features…

  • 1. enhance translation performance for NW and UG

measured in BLEU

  • 2. can be projected across languages

values computed for Arabic and English

  • 3. encourage translation consistency

since lexical choice is more tailored towards different genres

slide-50
SLIDE 50

Translation Model Adaptation Using Genre-Revealing Text Features 12

Enhanced translation performance

Manual provenance labels Automatic features (genre+LDA)

G&T NW G&T UG NIST NW NIST UG

+BLEU over baseline

0.2 0.4 0.6 0.8 1.0

✤ Automatic features can replace manual labels

slide-51
SLIDE 51

Translation Model Adaptation Using Genre-Revealing Text Features 13

Projection across languages

✤ Features can be extracted on either side of the bitext

Source-side genre features Target-side genre features

G&T NW G&T UG NIST NW NIST UG

+BLEU over baseline

0.2 0.4 0.6 0.8 1.0

slide-52
SLIDE 52

Translation Model Adaptation Using Genre-Revealing Text Features 14

Increased translation consistency*

✤ Repeated phrase: any phrase that occurs at least

twice in a single document

* Following Carpuat and Simard, 2012

slide-53
SLIDE 53

Translation Model Adaptation Using Genre-Revealing Text Features 14

Increased translation consistency*

✤ Repeated phrase: any phrase that occurs at least

twice in a single document

✤ If all translations are identical (except for punctuation

  • r stopwords): consistent translation

* Following Carpuat and Simard, 2012

slide-54
SLIDE 54

Translation Model Adaptation Using Genre-Revealing Text Features 14

Increased translation consistency*

✤ Repeated phrase: any phrase that occurs at least

twice in a single document

✤ If all translations are identical (except for punctuation

  • r stopwords): consistent translation

* Following Carpuat and Simard, 2012

slide-55
SLIDE 55

Translation Model Adaptation Using Genre-Revealing Text Features 14

Increased translation consistency*

✤ Repeated phrase: any phrase that occurs at least

twice in a single document

✤ If all translations are identical (except for punctuation

  • r stopwords): consistent translation

* Following Carpuat and Simard, 2012

slide-56
SLIDE 56

Translation Model Adaptation Using Genre-Revealing Text Features 14

Increased translation consistency*

✤ Repeated phrase: any phrase that occurs at least

twice in a single document

✤ If all translations are identical (except for punctuation

  • r stopwords): consistent translation

* Following Carpuat and Simard, 2012

slide-57
SLIDE 57

Translation Model Adaptation Using Genre-Revealing Text Features 14

Increased translation consistency*

✤ Repeated phrase: any phrase that occurs at least

twice in a single document

✤ If all translations are identical (except for punctuation

  • r stopwords): consistent translation

* Following Carpuat and Simard, 2012

slide-58
SLIDE 58

Translation Model Adaptation Using Genre-Revealing Text Features 15

Translation consistency: results

Baseline Adapted

G&T NW G&T UG NIST NW NIST UG

% consistent phrases

40 45 50 55 60 +4.2 +2.7 +0.1 +2.6

✤ Adapted system increases translation consistency

slide-59
SLIDE 59

Translation Model Adaptation Using Genre-Revealing Text Features 16

Genre adaptation: some examples

✤ Genre-adapted system favors:

slide-60
SLIDE 60

Translation Model Adaptation Using Genre-Revealing Text Features 16

Genre adaptation: some examples

✤ Genre-adapted system favors: ✦

colloquial translation options for UG

slide-61
SLIDE 61

Translation Model Adaptation Using Genre-Revealing Text Features 16

Genre adaptation: some examples

✤ Genre-adapted system favors: ✦

colloquial translation options for UG

و هذا يدل and this indicates and this shows مليار دولبر سنويا billion dollars annually billion dollars a year Baseline translation Adapted system’s translation Source phrase

slide-62
SLIDE 62

Translation Model Adaptation Using Genre-Revealing Text Features 16

Genre adaptation: some examples

✤ Genre-adapted system favors: ✦

colloquial translation options for UG

formal or concise translation options for NW

و هذا يدل and this indicates and this shows مليار دولبر سنويا billion dollars annually billion dollars a year Baseline translation Adapted system’s translation Source phrase

slide-63
SLIDE 63

Translation Model Adaptation Using Genre-Revealing Text Features 16

Genre adaptation: some examples

✤ Genre-adapted system favors: ✦

colloquial translation options for UG

formal or concise translation options for NW

Baseline translation Adapted system’s translation Source phrase القطاع الصحي workers in the health sector the health sector عالنيا worldwide global و هذا يدل and this indicates and this shows مليار دولبر سنويا billion dollars annually billion dollars a year Baseline translation Adapted system’s translation Source phrase

slide-64
SLIDE 64

Translation Model Adaptation Using Genre-Revealing Text Features 17

In conclusion: what we did and why

✤ Most approaches to domain adaptation for SMT rely

  • n provenance information
slide-65
SLIDE 65

Translation Model Adaptation Using Genre-Revealing Text Features 17

In conclusion: what we did and why

✤ Most approaches to domain adaptation for SMT rely

  • n provenance information

✤ Provenance is not an intrinsic text property and often

combines topic and genre

slide-66
SLIDE 66

Translation Model Adaptation Using Genre-Revealing Text Features 17

In conclusion: what we did and why

✤ Most approaches to domain adaptation for SMT rely

  • n provenance information

✤ Provenance is not an intrinsic text property and often

combines topic and genre

✤ When disentangling topic and genre, we found that

genre differences pose the biggest challenge to SMT

slide-67
SLIDE 67

Translation Model Adaptation Using Genre-Revealing Text Features 17

In conclusion: what we did and why

✤ Most approaches to domain adaptation for SMT rely

  • n provenance information

✤ Provenance is not an intrinsic text property and often

combines topic and genre

✤ When disentangling topic and genre, we found that

genre differences pose the biggest challenge to SMT

✤ We ask: can we address genre adaptation using only

intrinsic text features?

slide-68
SLIDE 68

Translation Model Adaptation Using Genre-Revealing Text Features 18

In conclusion: what we learned

✤ We can eliminate the need for manual provenance

information in a flexible adaptation framework

slide-69
SLIDE 69

Translation Model Adaptation Using Genre-Revealing Text Features 18

In conclusion: what we learned

✤ We can eliminate the need for manual provenance

information in a flexible adaptation framework

✤ Our proposed document-level genre features

slide-70
SLIDE 70

Translation Model Adaptation Using Genre-Revealing Text Features 18

In conclusion: what we learned

✤ We can eliminate the need for manual provenance

information in a flexible adaptation framework

✤ Our proposed document-level genre features ✦

are simple but powerful

slide-71
SLIDE 71

Translation Model Adaptation Using Genre-Revealing Text Features 18

In conclusion: what we learned

✤ We can eliminate the need for manual provenance

information in a flexible adaptation framework

✤ Our proposed document-level genre features ✦

are simple but powerful

enhance translation performance

slide-72
SLIDE 72

Translation Model Adaptation Using Genre-Revealing Text Features 18

In conclusion: what we learned

✤ We can eliminate the need for manual provenance

information in a flexible adaptation framework

✤ Our proposed document-level genre features ✦

are simple but powerful

enhance translation performance

can be projected across languages

slide-73
SLIDE 73

Translation Model Adaptation Using Genre-Revealing Text Features 18

In conclusion: what we learned

✤ We can eliminate the need for manual provenance

information in a flexible adaptation framework

✤ Our proposed document-level genre features ✦

are simple but powerful

enhance translation performance

can be projected across languages

encourage translation consistency

slide-74
SLIDE 74

Translation Model Adaptation Using Genre-Revealing Text Features 19

Thank you!