CS11-737: Multilingual Natural Language Processing Translation - PowerPoint PPT Presentation

CS11-737: Multilingual Natural Language Processing Translation Yulia Tsvetkov

Translation Mr. and Mrs. Dursley, who lived at El señor y la señora Dursley, que number 4 on Privet Drive, were proud vivían en el número 4 de Privet Drive, to say they were very normal, estaban orgullosos de decir que eran fortunately. muy normales, afortunadamente.

Plan ● The practice of translation ● Machine translation (MT) ● MT data sources ● MT evaluation

Translation is important and ubiquitous

Why is it difficult to translate? ● Lexical ambiguities and divergences across languages [Examples from Jurafsky & Martin Speech and Language Processing 2nd ed.]

Why is it difficult to translate? ● Cross-lingual lexical and structural divergences 錨玉自在枕上感念寶釵。。。又聽見窗外竹梢焦葉之上 , 雨聲漸沂 , 清寒透幕 , 不党又滴下淚來。 dai yu zi zai zhen shang gan nian bao chai...you ting jian chuang wal zhu shao xiang ye zhe shang, yu sheng xili, qing han tou mu, bu jue you di xia lei lat From “Dream of the Red Chamber” Cao Xue Qin (1792)

Why is it difficult to translate? [Example from Jurafsky & Martin Speech and Language Processing 2nd ed.]

Why is it difficult to translate? ● Ambiguities ○ words ○ morphology ○ semantics ○ pragmatics ● Gaps in data ○ availability of corpora ○ commonsense knowledge ● +Understanding of context, connotation, social norms, etc.

3 Classical methods for MT ● Direct ● Transfer ● Interlingua

The Vauquois triangle (1968)

Direct translation ● Word-by-word dictionary translation ● Rely on linguistic knowledge for simple reordering or morphological processing lexical transfer source target morphological local morphological using bilingual language analysis reordering generation language dictionary text text

Direct MT dictionary entry

Transfer approaches ● Levels of transfer

Transfer approaches ● Syntactic transfer

Transfer approaches ● Semantic transfer

Transfer approaches

Interlingua

Learning from data

Parallel corpora

Parallel corpora Mining parallel data from microblogs Ling et al. 2013

opus.nlpl.eu

Is it a good translation?

MT evaluation is hard ● MT Evaluation is a research topic on its own ● Language variability: there is no single correct translation ○ Is system A better than system B? ● Human evaluation is subjective

Human evaluation ● Adequacy and Fluency ○ Usually on a Likert scale (1 “not adequate at all” to 5 “completely adequate”)

Human evaluation ● Ranking of the outputs of different systems at the system level

Human evaluation ● Adequacy and Fluency ○ Usually on a Likert scale (1 “not adequate at all” to 5 “completely adequate”) ● Ranking of the outputs of different systems at the system level ● Post editing effort: how much effort does it take for a translator (or even monolingual) to “fix” the MT output so it is “good” ● Task-based evaluation: was the performance of the MT system sufficient to perform a task.

Automatic evaluation ● Precision-based ○ BLEU , NIST, ... ● F-score-based ○ Meteor,... ● Error rates ○ WER, TER, PER,... ● Using syntax/semantics ○ PosBleu, Meant, DepRef,... ● Embedding based ○ BertScore, chrF, YISI-1, ESIM , ...

Automatic evaluation ● The BLEU score proposed by IBM (Papineni et al., 2002) ○ Count n-grams overlap between machine translation output and reference reference translations ○ Compute precision for ngrams of size 1 to 4 ○ No recall (because difficult with multiple references) ○ To compensate for recall: “brevity penalty”. Translations that are too short are penalized ○ Final score is the geometric average of the n-gram precisions, times the brevity penalty ○ Calculate the aggregate score over a large test set

BLEU vs. human judgments

Automatic evaluation ● Embedding based ○ BertScore, chrF, YISI-1, ESIM , ...

MT venues and competitions ● MT tracks in *CL conferences ● WMT , IWSLT, AMTA... ● www.statmt.org

Class discussion ● Pick a 4-line excerpt from a poem in English ● Use Google translate to back-translate the poem via a pivot language, e.g., ○ English → Spanish → English ○ English → L1 → L2 → English, where L1 and L2 are typologically different from English and from each other ● Compare the original poem and its English back-translation, and share your observations. For example, ○ what information got lost in the process of translation? ○ Are there translation errors associated with linguistic properties of pivot languages and with linguistic divergences across languages? ○ Try different pivot languages: can you provide insights about the quality of MT for those language pairs?

CS11-737: Multilingual Natural Language Processing Translation - PowerPoint PPT Presentation

CS11-737: Multilingual Natural Language Processing Translation Yulia Tsvetkov Translation Mr. and Mrs. Dursley, who lived at El seor y la seora Dursley, que number 4 on Privet Drive, were proud vivan en el nmero 4 de Privet Drive,

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

CS11-737: Multilingual Natural Language Processing Typology: The Space of Languages Yulia

Drupal 8 Multilingual Wonderland Gabor Hojtsy Acquia Foreign language site Multilingual site

Drupal 8s multilingual APIs Gbor Hojtsy DRUPAL 7 MULTILINGUAL DRUPAL 7 MULTILINGUAL Drupal

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

ubiquity: designing a multilingual natural language interface mitcho Michael Yoshitaka Erlewine

Information Extraction Industrial Natural Language Processing Industrial Natural Language

737 Fort Street 737 Fort Street Victoria, BC Victoria, BC V8W 2V1 V8W 2V1 A DISCUSSION ON: A

to X-Sell Ac Access Co Code: 653-859 859-737 737 Please submit questions using the

11-737 Multilingual NLP Lang in 10: Hindi Example of 10 minute presentation on a language Hindi

Multilingual Training and Cross-lingual Transfer Xinyi Wang Many languages are left behind

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

pab stakeholder discussion January 30 and January 31 pab stakeholder goals and timeline

PhEDEx and CMS Data Transfers Paul Rossman Fermilab Global CMS Data Network Paul Rossman

Multiclass object recognition Sharing parts and transfer learning Sharat Chikkerur Outline

Multi-Task Learning & Transfer Learning Basics CS 330 1 Logistics Homework 1 posted Monday

Deep Transfer Mapping for Unsupervised Writer Adaptation Hong-Ming Yang 1,2 , Xu-Yao Zhang 1,2 ,

2018 U.S. Utility-Scale Photovoltaics-Plus-Energy Storage System Costs Benchmark Ran Fu, Timothy

USC "Entrepreneurship" -- Information Modeling Instructor: Peter Baumann email:

Horizontal or Vertical Storage A fact table for data warehousing is often fat Tens of even

Sambuz

Useful Links

Newsletter

Mail Us

CS11-737: Multilingual Natural Language Processing Translation - PowerPoint PPT Presentation

CS11-737: Multilingual Natural Language Processing Translation Yulia Tsvetkov Translation Mr. and Mrs. Dursley, who lived at El seor y la seora Dursley, que number 4 on Privet Drive, were proud vivan en el nmero 4 de Privet Drive,

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

CS11-737: Multilingual Natural Language Processing Typology: The Space of Languages Yulia

Drupal 8 Multilingual Wonderland Gabor Hojtsy Acquia Foreign language site Multilingual site

Drupal 8s multilingual APIs Gbor Hojtsy DRUPAL 7 MULTILINGUAL DRUPAL 7 MULTILINGUAL Drupal

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

ubiquity: designing a multilingual natural language interface mitcho Michael Yoshitaka Erlewine

Information Extraction Industrial Natural Language Processing Industrial Natural Language

737 Fort Street 737 Fort Street Victoria, BC Victoria, BC V8W 2V1 V8W 2V1 A DISCUSSION ON: A

to X-Sell Ac Access Co Code: 653-859 859-737 737 Please submit questions using the

11-737 Multilingual NLP Lang in 10: Hindi Example of 10 minute presentation on a language Hindi

Multilingual Training and Cross-lingual Transfer Xinyi Wang Many languages are left behind

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

pab stakeholder discussion January 30 and January 31 pab stakeholder goals and timeline

PhEDEx and CMS Data Transfers Paul Rossman Fermilab Global CMS Data Network Paul Rossman

Multiclass object recognition Sharing parts and transfer learning Sharat Chikkerur Outline

Multi-Task Learning &amp; Transfer Learning Basics CS 330 1 Logistics Homework 1 posted Monday

Deep Transfer Mapping for Unsupervised Writer Adaptation Hong-Ming Yang 1,2 , Xu-Yao Zhang 1,2 ,

2018 U.S. Utility-Scale Photovoltaics-Plus-Energy Storage System Costs Benchmark Ran Fu, Timothy

USC &quot;Entrepreneurship&quot; -- Information Modeling Instructor: Peter Baumann email:

Horizontal or Vertical Storage A fact table for data warehousing is often fat Tens of even

Sambuz

Useful Links

Newsletter

Mail Us

Multi-Task Learning & Transfer Learning Basics CS 330 1 Logistics Homework 1 posted Monday

USC "Entrepreneurship" -- Information Modeling Instructor: Peter Baumann email: