translation model adaptation using genre revealing text
play

Translation Model Adaptation Using Genre-Revealing Text Features - PowerPoint PPT Presentation

Translation Model Adaptation Using Genre-Revealing Text Features Marlies van der Wees, Arianna Bisazza, Christof Monz Domain adaptation for SMT Prioritize translation candidates that are most relevant to a specific task Translation Model


  1. Translation Model Adaptation Using Genre-Revealing Text Features Marlies van der Wees, Arianna Bisazza, Christof Monz

  2. Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task Translation Model Adaptation 2 Using Genre-Revealing Text Features

  3. Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task Heterogeneous training data Specific translation task Translation Model Adaptation 2 Using Genre-Revealing Text Features

  4. Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task Heterogeneous training data Specific translation task Translation Model Adaptation 2 Using Genre-Revealing Text Features

  5. Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task source p(f|e) p(e|f) … target Heterogeneous !ﻟﺣﻣ% & praise be to 0.1 0.2 … training data Specific !ﻟﺣﻣ% & praise for 0.2 0.2 … translation task thank 0.1 0.2 … !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + my dear 0.2 0.1 … my love 0.2 0.1 … ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my sweetheart 0.1 0.1 … Translation Model Adaptation 2 Using Genre-Revealing Text Features

  6. Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task source source p(f|e) p(e|f) … p(f|e) p(e|f) … target target Heterogeneous !ﻟﺣﻣ% & !ﻟﺣﻣ% & praise be to praise be to 0.1 0.2 … 0.1 0.2 … training data Specific !ﻟﺣﻣ% & !ﻟﺣﻣ% & praise for praise for 0.2 0.2 … 0.2 0.2 … translation task thank thank 0.1 0.2 … 0.1 0.2 … !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my dear my dear 0.2 0.1 … 0.2 0.1 … my love my love 0.2 0.1 … 0.2 0.1 … ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my sweetheart my sweetheart 0.1 0.1 … 0.1 0.1 … Translation Model Adaptation 2 Using Genre-Revealing Text Features

  7. Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task source source p(f|e) p(e|f) … p(f|e) p(e|f) … target target Heterogeneous !ﻟﺣﻣ% & !ﻟﺣﻣ% & praise be to praise be to 0.1 0.2 … 0.1 0.2 … training data Specific !ﻟﺣﻣ% & !ﻟﺣﻣ% & praise for praise for 0.2 0.2 … 0.2 0.2 … translation task thank thank 0.1 0.2 … 0.1 0.2 … !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my dear my dear 0.2 0.1 … 0.2 0.1 … my love my love 0.2 0.1 … 0.2 0.1 … ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my sweetheart my sweetheart 0.1 0.1 … 0.1 0.1 … ✤ What type of domain information to use? Translation Model Adaptation 2 Using Genre-Revealing Text Features

  8. Dimensions of domains ✤ Topic refers to general subject politics, sports, tennis ✦ Translation Model Adaptation 3 Using Genre-Revealing Text Features

  9. Dimensions of domains ✤ Topic refers to general subject politics, sports, tennis ✦ ✤ Genre refers to function, style, text type editorials, newswire, user-generated text ✦ orthogonal to topic ✦ Translation Model Adaptation 3 Using Genre-Revealing Text Features

  10. Dimensions of domains ✤ Topic refers to general subject politics, sports, tennis ✦ ✤ Genre refers to function, style, text type editorials, newswire, user-generated text ✦ orthogonal to topic ✦ ✤ Provenance refers to document’s origin LDC2005T13, Europarl, EMEA ✦ Translation Model Adaptation 3 Using Genre-Revealing Text Features

  11. The problem with provenance Provenance information has proven useful for adaptation in SMT, but is it the best representation of a domain? Translation Model Adaptation 4 Using Genre-Revealing Text Features

  12. The problem with provenance Provenance information has proven useful for adaptation in SMT, but is it the best representation of a domain? ✤ It’s not an intrinsic text property Translation Model Adaptation 4 Using Genre-Revealing Text Features

  13. The problem with provenance Provenance information has proven useful for adaptation in SMT, but is it the best representation of a domain? ✤ It’s not an intrinsic text property ✤ We might need manual labeling labor-intensive ✦ arbitrary ✦ Translation Model Adaptation 4 Using Genre-Revealing Text Features

  14. The problem with provenance Provenance information has proven useful for adaptation in SMT, but is it the best representation of a domain? ✤ It’s not an intrinsic text property ✤ We might need manual labeling labor-intensive ✦ arbitrary ✦ ✤ Often combines particular topic and genre Translation Model Adaptation 4 Using Genre-Revealing Text Features

  15. Disentangling topic and genre in SMT* ✤ Experiments on News Comment controlled test set: The 12 contestants You allowed Barwas Culture competed during a to represent Iraq Gen&Topic May 3rd Prime. while she sings in Kurdish!!! Economy Yemen is mulling What development the establishment of in Yemen are you 13 industrial zones. talking about? Translation Model Adaptation * Van der Wees et al., 2015 5 Using Genre-Revealing Text Features

  16. Disentangling topic and genre in SMT* ✤ Experiments on News Comment controlled test set: The 12 contestants You allowed Barwas Culture competed during a to represent Iraq Gen&Topic May 3rd Prime. while she sings in ✤ Genre has larger Kurdish!!! impact on SMT than topic Economy Yemen is mulling What development the establishment of in Yemen are you 13 industrial zones. talking about? Translation Model Adaptation * Van der Wees et al., 2015 5 Using Genre-Revealing Text Features

  17. Disentangling topic and genre in SMT* ✤ Experiments on News Comment controlled test set: The 12 contestants You allowed Barwas Culture competed during a to represent Iraq Gen&Topic May 3rd Prime. while she sings in ✤ Genre has larger Kurdish!!! impact on SMT than topic Economy Yemen is mulling What development the establishment of in Yemen are you ✤ We want to adapt 13 industrial zones. talking about? to different genres in a test corpus! Translation Model Adaptation * Van der Wees et al., 2015 5 Using Genre-Revealing Text Features

  18. What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ Translation Model Adaptation 6 Using Genre-Revealing Text Features

  19. What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ Translation Model Adaptation 6 Using Genre-Revealing Text Features

  20. What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ ✤ Genre information determine and exploit intrinsic ✦ genre-revealing text features Translation Model Adaptation 6 Using Genre-Revealing Text Features

  21. What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ ✤ Genre information determine and exploit intrinsic ✦ genre-revealing text features Translation Model Adaptation 6 Using Genre-Revealing Text Features

  22. What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ ✤ Genre information determine and exploit intrinsic ✦ genre-revealing text features Translation Model Adaptation 6 Using Genre-Revealing Text Features

  23. What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ ✤ Genre information determine and exploit intrinsic ✦ genre-revealing text features Translation Model Adaptation 6 Using Genre-Revealing Text Features

  24. Genre adaptation: the task ✤ Arabic-English phrase-based SMT Translation Model Adaptation * ilps.science.uva.nl/resources/gen-topic/ 7 Using Genre-Revealing Text Features

  25. Genre adaptation: the task ✤ Arabic-English phrase-based SMT ✤ Two multi-genre evaluation sets: Gen&Topic*: ✦ newswire (NW) • comments (UG) • Translation Model Adaptation * ilps.science.uva.nl/resources/gen-topic/ 7 Using Genre-Revealing Text Features

  26. Genre adaptation: the task ✤ Arabic-English phrase-based SMT ✤ Two multi-genre evaluation sets: Gen&Topic*: NIST: ✦ ✦ newswire (NW) newswire (NW) • • comments (UG) weblogs (UG) • • Translation Model Adaptation * ilps.science.uva.nl/resources/gen-topic/ 7 Using Genre-Revealing Text Features

  27. Genre adaptation: the task ✤ Arabic-English phrase-based SMT ✤ Two multi-genre evaluation sets: Gen&Topic*: NIST: ✦ ✦ newswire (NW) newswire (NW) • • comments (UG) weblogs (UG) • • ✤ Translation model adaptation Translation Model Adaptation * ilps.science.uva.nl/resources/gen-topic/ 7 Using Genre-Revealing Text Features

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend