Translation Model Adaptation Using Genre-Revealing Text Features - - PowerPoint PPT Presentation
Translation Model Adaptation Using Genre-Revealing Text Features - - PowerPoint PPT Presentation
Translation Model Adaptation Using Genre-Revealing Text Features Marlies van der Wees, Arianna Bisazza, Christof Monz Domain adaptation for SMT Prioritize translation candidates that are most relevant to a specific task Translation Model
Translation Model Adaptation Using Genre-Revealing Text Features 2
Domain adaptation for SMT
✤ Prioritize translation candidates that are most
relevant to a specific task
Translation Model Adaptation Using Genre-Revealing Text Features 2
Domain adaptation for SMT
✤ Prioritize translation candidates that are most
relevant to a specific task
Heterogeneous training data Specific translation task
Translation Model Adaptation Using Genre-Revealing Text Features 2
Domain adaptation for SMT
✤ Prioritize translation candidates that are most
relevant to a specific task
Heterogeneous training data Specific translation task
Translation Model Adaptation Using Genre-Revealing Text Features
!ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 …
2
Domain adaptation for SMT
✤ Prioritize translation candidates that are most
relevant to a specific task
Heterogeneous training data Specific translation task
Translation Model Adaptation Using Genre-Revealing Text Features
!ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 …
2
Domain adaptation for SMT
✤ Prioritize translation candidates that are most
relevant to a specific task
Heterogeneous training data Specific translation task
!ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 …
Translation Model Adaptation Using Genre-Revealing Text Features
!ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 …
2
Domain adaptation for SMT
✤ Prioritize translation candidates that are most
relevant to a specific task
Heterogeneous training data Specific translation task
✤ What type of domain information to use?
!ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 …
Translation Model Adaptation Using Genre-Revealing Text Features 3
Dimensions of domains
✤ Topic refers to general subject ✦
politics, sports, tennis
Translation Model Adaptation Using Genre-Revealing Text Features 3
Dimensions of domains
✤ Topic refers to general subject ✦
politics, sports, tennis
✤ Genre refers to function, style, text type ✦
editorials, newswire, user-generated text
✦
- rthogonal to topic
Translation Model Adaptation Using Genre-Revealing Text Features 3
Dimensions of domains
✤ Topic refers to general subject ✦
politics, sports, tennis
✤ Genre refers to function, style, text type ✦
editorials, newswire, user-generated text
✦
- rthogonal to topic
✤ Provenance refers to document’s origin ✦
LDC2005T13, Europarl, EMEA
Translation Model Adaptation Using Genre-Revealing Text Features 4
The problem with provenance
Provenance information has proven useful for adaptation in SMT, but is it the best representation
- f a domain?
Translation Model Adaptation Using Genre-Revealing Text Features 4
The problem with provenance
✤ It’s not an intrinsic text property
Provenance information has proven useful for adaptation in SMT, but is it the best representation
- f a domain?
Translation Model Adaptation Using Genre-Revealing Text Features 4
The problem with provenance
✤ It’s not an intrinsic text property ✤ We might need manual labeling ✦
labor-intensive
✦
arbitrary Provenance information has proven useful for adaptation in SMT, but is it the best representation
- f a domain?
Translation Model Adaptation Using Genre-Revealing Text Features 4
The problem with provenance
✤ It’s not an intrinsic text property ✤ We might need manual labeling ✦
labor-intensive
✦
arbitrary
✤ Often combines particular topic and genre
Provenance information has proven useful for adaptation in SMT, but is it the best representation
- f a domain?
Translation Model Adaptation Using Genre-Revealing Text Features 5
Disentangling topic and genre in SMT*
✤ Experiments on
controlled test set: Gen&Topic
* Van der Wees et al., 2015
News Comment Culture Economy
The 12 contestants competed during a May 3rd Prime. You allowed Barwas to represent Iraq while she sings in Kurdish!!! What development in Yemen are you talking about? Yemen is mulling the establishment of 13 industrial zones.
Translation Model Adaptation Using Genre-Revealing Text Features 5
Disentangling topic and genre in SMT*
✤ Experiments on
controlled test set: Gen&Topic
✤ Genre has larger
impact on SMT than topic
* Van der Wees et al., 2015
News Comment Culture Economy
The 12 contestants competed during a May 3rd Prime. You allowed Barwas to represent Iraq while she sings in Kurdish!!! What development in Yemen are you talking about? Yemen is mulling the establishment of 13 industrial zones.
Translation Model Adaptation Using Genre-Revealing Text Features 5
Disentangling topic and genre in SMT*
✤ Experiments on
controlled test set: Gen&Topic
✤ Genre has larger
impact on SMT than topic
✤ We want to adapt
to different genres in a test corpus!
* Van der Wees et al., 2015
News Comment Culture Economy
The 12 contestants competed during a May 3rd Prime. You allowed Barwas to represent Iraq while she sings in Kurdish!!! What development in Yemen are you talking about? Yemen is mulling the establishment of 13 industrial zones.
Translation Model Adaptation Using Genre-Revealing Text Features 6
What information to use for adaptation?
✤ Provenance information ✦
manual grouping of sub-corpora
Translation Model Adaptation Using Genre-Revealing Text Features 6
What information to use for adaptation?
✤ Provenance information ✦
manual grouping of sub-corpora
✤ Topic information ✦
unsupervised LDA-inferred topics
Translation Model Adaptation Using Genre-Revealing Text Features 6
What information to use for adaptation?
✤ Provenance information ✦
manual grouping of sub-corpora
✤ Topic information ✦
unsupervised LDA-inferred topics
✤ Genre information ✦
determine and exploit intrinsic genre-revealing text features
Translation Model Adaptation Using Genre-Revealing Text Features 6
What information to use for adaptation?
✤ Provenance information ✦
manual grouping of sub-corpora
✤ Topic information ✦
unsupervised LDA-inferred topics
✤ Genre information ✦
determine and exploit intrinsic genre-revealing text features
Translation Model Adaptation Using Genre-Revealing Text Features 6
What information to use for adaptation?
✤ Provenance information ✦
manual grouping of sub-corpora
✤ Topic information ✦
unsupervised LDA-inferred topics
✤ Genre information ✦
determine and exploit intrinsic genre-revealing text features
Translation Model Adaptation Using Genre-Revealing Text Features 6
What information to use for adaptation?
✤ Provenance information ✦
manual grouping of sub-corpora
✤ Topic information ✦
unsupervised LDA-inferred topics
✤ Genre information ✦
determine and exploit intrinsic genre-revealing text features
Translation Model Adaptation Using Genre-Revealing Text Features 7
Genre adaptation: the task
✤ Arabic-English phrase-based SMT
* ilps.science.uva.nl/resources/gen-topic/
Translation Model Adaptation Using Genre-Revealing Text Features 7
Genre adaptation: the task
✤ Arabic-English phrase-based SMT
* ilps.science.uva.nl/resources/gen-topic/
✤ Two multi-genre evaluation sets: ✦
Gen&Topic*:
- newswire (NW)
- comments (UG)
Translation Model Adaptation Using Genre-Revealing Text Features 7
Genre adaptation: the task
✤ Arabic-English phrase-based SMT
* ilps.science.uva.nl/resources/gen-topic/
✦
NIST:
- newswire (NW)
- weblogs (UG)
✤ Two multi-genre evaluation sets: ✦
Gen&Topic*:
- newswire (NW)
- comments (UG)
Translation Model Adaptation Using Genre-Revealing Text Features 7
Genre adaptation: the task
✤ Arabic-English phrase-based SMT
* ilps.science.uva.nl/resources/gen-topic/
✦
NIST:
- newswire (NW)
- weblogs (UG)
✤ Two multi-genre evaluation sets: ✦
Gen&Topic*:
- newswire (NW)
- comments (UG)
✤ Translation model adaptation
Translation Model Adaptation Using Genre-Revealing Text Features 8
Genre adaptation: general framework
✤ Vector space model (VSM) for translation model
adaptation*
* Following Chen et al., 2013
Translation Model Adaptation Using Genre-Revealing Text Features
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N 8
Genre adaptation: general framework
✤ Vector space model (VSM) for translation model
adaptation*
* Following Chen et al., 2013
Translation Model Adaptation Using Genre-Revealing Text Features
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N 8
Genre adaptation: general framework
✤ Vector space model (VSM) for translation model
adaptation*
* Following Chen et al., 2013
Translation Model Adaptation Using Genre-Revealing Text Features
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N 8
Genre adaptation: general framework
✤ Vector space model (VSM) for translation model
adaptation*
* Following Chen et al., 2013
Translation Model Adaptation Using Genre-Revealing Text Features
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N 8
Genre adaptation: general framework
✤ Vector space model (VSM) for translation model
adaptation*
* Following Chen et al., 2013
Translation Model Adaptation Using Genre-Revealing Text Features
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N 8
Genre adaptation: general framework
✤ Vector space model (VSM) for translation model
adaptation*
* Following Chen et al., 2013
Translation Model Adaptation Using Genre-Revealing Text Features
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N 8
Genre adaptation: general framework
✤ Vector space model (VSM) for translation model
adaptation*
* Following Chen et al., 2013
Translation Model Adaptation Using Genre-Revealing Text Features
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 0.3 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N 8
Genre adaptation: general framework
✤ Vector space model (VSM) for translation model
adaptation*
* Following Chen et al., 2013
Translation Model Adaptation Using Genre-Revealing Text Features
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 0.3 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 0.3 0.4 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N 8
Genre adaptation: general framework
✤ Vector space model (VSM) for translation model
adaptation*
* Following Chen et al., 2013
Translation Model Adaptation Using Genre-Revealing Text Features
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 0.3 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 0.3 0.4 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N 8
Genre adaptation: general framework
✤ Vector space model (VSM) for translation model
adaptation*
* Following Chen et al., 2013 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 0.3 0.4 0.1 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
Translation Model Adaptation Using Genre-Revealing Text Features
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
similarity score 0.1 0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.2 0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.4 0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.3 0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.4 0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 0.3 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
0.1 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 0.3 0.4 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N 8
Genre adaptation: general framework
✤ Vector space model (VSM) for translation model
adaptation*
* Following Chen et al., 2013 !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 0.3 0.4 0.1 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
!ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + praise be to praise for thank my dear my love my sweetheart !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + source target p(f|e) p(e|f) … 0.1 0.2 … 0.2 0.2 … 0.1 0.2 … 0.2 0.1 … 0.2 0.1 … 0.1 0.1 … phrase vector < w … w >
1 N
similarity score 0.1 Vector for development set: < w (dev) … w (dev) >
1 N
0.2 0.4 0.3 0.4 0.1 < w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
< w … w >
1 N
Translation Model Adaptation Using Genre-Revealing Text Features 9
How to construct genre-informed vectors?
✤ Original version: provenance information ✦
following Chen et al., 2013
Translation Model Adaptation Using Genre-Revealing Text Features 9
How to construct genre-informed vectors?
✤ Original version: provenance information ✦
following Chen et al., 2013
✤ Our version: intrinsic genre information
Translation Model Adaptation Using Genre-Revealing Text Features 9
How to construct genre-informed vectors?
✤ Original version: provenance information ✦
following Chen et al., 2013
✤ Our version: intrinsic genre information ✦
document-level genre features borrowed from text classification literature
Translation Model Adaptation Using Genre-Revealing Text Features 9
How to construct genre-informed vectors?
✤ Original version: provenance information ✦
following Chen et al., 2013
✤ Our version: intrinsic genre information ✦
document-level genre features borrowed from text classification literature
✦
directly observable in raw text
Translation Model Adaptation Using Genre-Revealing Text Features 9
How to construct genre-informed vectors?
✤ Original version: provenance information ✦
following Chen et al., 2013
✤ Our version: intrinsic genre information ✦
document-level genre features borrowed from text classification literature
✦
directly observable in raw text
✦
we also test: to what extent can LDA-inferred ‘topics’ distinguish our genres?
Translation Model Adaptation Using Genre-Revealing Text Features 10
Genre adaptation: genre-revealing features
Translation Model Adaptation Using Genre-Revealing Text Features 10
Genre adaptation: genre-revealing features
✤ Seven most discriminative features between NW and
UG are used in final VSM version
Translation Model Adaptation Using Genre-Revealing Text Features 11
Genre adaptation: three hypotheses
The proposed genre-revealing features…
Translation Model Adaptation Using Genre-Revealing Text Features 11
Genre adaptation: three hypotheses
The proposed genre-revealing features…
- 1. enhance translation performance for NW and UG
✦
measured in BLEU
Translation Model Adaptation Using Genre-Revealing Text Features 11
Genre adaptation: three hypotheses
The proposed genre-revealing features…
- 1. enhance translation performance for NW and UG
✦
measured in BLEU
- 2. can be projected across languages
✦
values computed for Arabic and English
Translation Model Adaptation Using Genre-Revealing Text Features 11
Genre adaptation: three hypotheses
The proposed genre-revealing features…
- 1. enhance translation performance for NW and UG
✦
measured in BLEU
- 2. can be projected across languages
✦
values computed for Arabic and English
- 3. encourage translation consistency
✦
since lexical choice is more tailored towards different genres
Translation Model Adaptation Using Genre-Revealing Text Features 12
Enhanced translation performance
Manual provenance labels Automatic features (genre+LDA)
G&T NW G&T UG NIST NW NIST UG
+BLEU over baseline
0.2 0.4 0.6 0.8 1.0
✤ Automatic features can replace manual labels
Translation Model Adaptation Using Genre-Revealing Text Features 13
Projection across languages
✤ Features can be extracted on either side of the bitext
Source-side genre features Target-side genre features
G&T NW G&T UG NIST NW NIST UG
+BLEU over baseline
0.2 0.4 0.6 0.8 1.0
Translation Model Adaptation Using Genre-Revealing Text Features 14
Increased translation consistency*
✤ Repeated phrase: any phrase that occurs at least
twice in a single document
* Following Carpuat and Simard, 2012
Translation Model Adaptation Using Genre-Revealing Text Features 14
Increased translation consistency*
✤ Repeated phrase: any phrase that occurs at least
twice in a single document
✤ If all translations are identical (except for punctuation
- r stopwords): consistent translation
* Following Carpuat and Simard, 2012
Translation Model Adaptation Using Genre-Revealing Text Features 14
Increased translation consistency*
✤ Repeated phrase: any phrase that occurs at least
twice in a single document
✤ If all translations are identical (except for punctuation
- r stopwords): consistent translation
* Following Carpuat and Simard, 2012
Translation Model Adaptation Using Genre-Revealing Text Features 14
Increased translation consistency*
✤ Repeated phrase: any phrase that occurs at least
twice in a single document
✤ If all translations are identical (except for punctuation
- r stopwords): consistent translation
* Following Carpuat and Simard, 2012
Translation Model Adaptation Using Genre-Revealing Text Features 14
Increased translation consistency*
✤ Repeated phrase: any phrase that occurs at least
twice in a single document
✤ If all translations are identical (except for punctuation
- r stopwords): consistent translation
* Following Carpuat and Simard, 2012
Translation Model Adaptation Using Genre-Revealing Text Features 14
Increased translation consistency*
✤ Repeated phrase: any phrase that occurs at least
twice in a single document
✤ If all translations are identical (except for punctuation
- r stopwords): consistent translation
* Following Carpuat and Simard, 2012
Translation Model Adaptation Using Genre-Revealing Text Features 15
Translation consistency: results
Baseline Adapted
G&T NW G&T UG NIST NW NIST UG
% consistent phrases
40 45 50 55 60 +4.2 +2.7 +0.1 +2.6
✤ Adapted system increases translation consistency
Translation Model Adaptation Using Genre-Revealing Text Features 16
Genre adaptation: some examples
✤ Genre-adapted system favors:
Translation Model Adaptation Using Genre-Revealing Text Features 16
Genre adaptation: some examples
✤ Genre-adapted system favors: ✦
colloquial translation options for UG
Translation Model Adaptation Using Genre-Revealing Text Features 16
Genre adaptation: some examples
✤ Genre-adapted system favors: ✦
colloquial translation options for UG
و هذا يدل and this indicates and this shows مليار دولبر سنويا billion dollars annually billion dollars a year Baseline translation Adapted system’s translation Source phrase
Translation Model Adaptation Using Genre-Revealing Text Features 16
Genre adaptation: some examples
✤ Genre-adapted system favors: ✦
colloquial translation options for UG
✦
formal or concise translation options for NW
و هذا يدل and this indicates and this shows مليار دولبر سنويا billion dollars annually billion dollars a year Baseline translation Adapted system’s translation Source phrase
Translation Model Adaptation Using Genre-Revealing Text Features 16
Genre adaptation: some examples
✤ Genre-adapted system favors: ✦
colloquial translation options for UG
✦
formal or concise translation options for NW
Baseline translation Adapted system’s translation Source phrase القطاع الصحي workers in the health sector the health sector عالنيا worldwide global و هذا يدل and this indicates and this shows مليار دولبر سنويا billion dollars annually billion dollars a year Baseline translation Adapted system’s translation Source phrase
Translation Model Adaptation Using Genre-Revealing Text Features 17
In conclusion: what we did and why
✤ Most approaches to domain adaptation for SMT rely
- n provenance information
Translation Model Adaptation Using Genre-Revealing Text Features 17
In conclusion: what we did and why
✤ Most approaches to domain adaptation for SMT rely
- n provenance information
✤ Provenance is not an intrinsic text property and often
combines topic and genre
Translation Model Adaptation Using Genre-Revealing Text Features 17
In conclusion: what we did and why
✤ Most approaches to domain adaptation for SMT rely
- n provenance information
✤ Provenance is not an intrinsic text property and often
combines topic and genre
✤ When disentangling topic and genre, we found that
genre differences pose the biggest challenge to SMT
Translation Model Adaptation Using Genre-Revealing Text Features 17
In conclusion: what we did and why
✤ Most approaches to domain adaptation for SMT rely
- n provenance information
✤ Provenance is not an intrinsic text property and often
combines topic and genre
✤ When disentangling topic and genre, we found that
genre differences pose the biggest challenge to SMT
✤ We ask: can we address genre adaptation using only
intrinsic text features?
Translation Model Adaptation Using Genre-Revealing Text Features 18
In conclusion: what we learned
✤ We can eliminate the need for manual provenance
information in a flexible adaptation framework
Translation Model Adaptation Using Genre-Revealing Text Features 18
In conclusion: what we learned
✤ We can eliminate the need for manual provenance
information in a flexible adaptation framework
✤ Our proposed document-level genre features
Translation Model Adaptation Using Genre-Revealing Text Features 18
In conclusion: what we learned
✤ We can eliminate the need for manual provenance
information in a flexible adaptation framework
✤ Our proposed document-level genre features ✦
are simple but powerful
Translation Model Adaptation Using Genre-Revealing Text Features 18
In conclusion: what we learned
✤ We can eliminate the need for manual provenance
information in a flexible adaptation framework
✤ Our proposed document-level genre features ✦
are simple but powerful
✦
enhance translation performance
Translation Model Adaptation Using Genre-Revealing Text Features 18
In conclusion: what we learned
✤ We can eliminate the need for manual provenance
information in a flexible adaptation framework
✤ Our proposed document-level genre features ✦
are simple but powerful
✦
enhance translation performance
✦
can be projected across languages
Translation Model Adaptation Using Genre-Revealing Text Features 18
In conclusion: what we learned
✤ We can eliminate the need for manual provenance
information in a flexible adaptation framework
✤ Our proposed document-level genre features ✦
are simple but powerful
✦
enhance translation performance
✦
can be projected across languages
✦
encourage translation consistency
Translation Model Adaptation Using Genre-Revealing Text Features 19