Data-based Strategies to Low-resource MT Graham Neubig Site - - PowerPoint PPT Presentation

data based strategies to low resource mt
SMART_READER_LITE
LIVE PREVIEW

Data-based Strategies to Low-resource MT Graham Neubig Site - - PowerPoint PPT Presentation

CS11-737 Multilingual NLP Data-based Strategies to Low-resource MT Graham Neubig Site http://demo.clab.cs.cmu.edu/11737fa20/ Many slides from: Xia, Mengzhou, et al. "Generalized data augmentation for low-resource translation." ACL


slide-1
SLIDE 1

CS11-737 Multilingual NLP

Data-based Strategies to Low-resource MT

Graham Neubig

Site http://demo.clab.cs.cmu.edu/11737fa20/

Many slides from: Xia, Mengzhou, et al. "Generalized data augmentation for low-resource translation." ACL 2019.

slide-2
SLIDE 2

Data Challenges in Low-resource MT

  • MT of high-resource languages (HRLs) with large

parallel corpora → relatively good translations

HRL TRG LRL TRG

  • MT of low-resource languages (LRLs) with small

parallel corpora → nonsense!

slide-3
SLIDE 3

A Concrete Example

source - Atam balaca boz radiosunda BBC Xəbərlərinə qulaq asırdı. translation - So I’m going to became a lot of people. reference - My father was listening to BBC News on his small , gray radio. A system that is trained with 5000 sentence pairs on Azerbaijani and English ? Does not convey the correct meaning at all.

slide-4
SLIDE 4

Multilingual Training Approaches

  • Transfer HRL to LRL (Zoph et

al., 2016; Nguyen and Chiang, 2017)


HRL TRG LRL TRG

MT Syste m

train adapt

  • Joint training with LRL

and HRL parallel data

(Johnson et al., 2017; Neubig and Hu, 2018)

HRL TRG LRL TRG concatenate

MT System

  • Problem: Suboptimal lexical/syntactic sharing.
  • Problem: Can't leverage monolingual data.
slide-5
SLIDE 5

LRL HRL TRG-L TRG-H TRG-M

Data Augmentation

Available Resources Augmented Data

LRL TRG

slide-6
SLIDE 6

Data Augmentation 101: Back Translation

LRL HRL TRG-L TRG-H LRL-M TRG-M TRG-M TRG -> LRL

slide-7
SLIDE 7

Back Translation Idea

Sennrich, Rico, Barry Haddow, and Alexandra Birch. "Improving neural machine translation models with monolingual data." ACL 2016.

LRL TRG-L TRG-M LRL-M TRG-M

  • 2. Back-translate TRG→LRL
  • 1. Train TRG→LRL

System TRG→ LRL

  • 3. Train LRL→TRG

LRL→ TRG

  • Some degree of error in source data is permissible!
slide-8
SLIDE 8

How to Generate Translations

  • How to generate translations?
  • Beam search (Sennrich et al. 2016)
  • Select the highest scoring output
  • Higher quality, but lower diversity, potential for

data bias

  • Sampling (Edunov et al. 2018)
  • Randomly sample from back-translation model
  • Lower overall quality, but higher diversity
  • Sampling has shown to be more effective overall

Understanding Back-Translation at Scale. Sergey Edunov, Myle Ott, Michael Auli, David Grangier. EMNLP 2018.

slide-9
SLIDE 9

Iterative Back-translation

Vu Cong Duy Hoang, Philipp Koehn, Gholamreza Haffari, Trevor Cohn. "Iterative Back-Translation for Neural Machine Translation" WNGT 2018.

LRL TRG TRG LRL LRL TRG

  • 4. Back-translate TRG-LRL
  • 2. Forward-translate

LRL-TRG LRL TRG LRL→ TRG

  • 5. Final LRL→TRG

System TRG→ LRL

  • 3. Train TRG→LRL

System

  • 1. Train LRL→TRG

System LRL→ TRG

slide-10
SLIDE 10

Back Translation Issues

  • Back-translation fails in low-resource languages or

domains

  • Use other high-resource languages
  • Combine with monolingual data (maybe with denoising
  • bjectives, covered in following class)
  • Perform other varieties of rule-based augmentation
slide-11
SLIDE 11

Using HRLs in Augmentation

Xia, Mengzhou, et al. "Generalized data augmentation for low-resource translation." ACL 2019.

slide-12
SLIDE 12

English -> HRL Augmentation

  • Problem: TRG-LRL back-translation might be

low quality


  • Idea: also back-


translate into HRL ○ more sentence pairs ○ vocabulary sharing of source-side ○ syntactic similarity of source-side ○ improves target-side LM

HRL-M

TRG-M TRG-M TRG -> HRL

TRG: Thank you very much. TUR: Çok teşekkür ederim. AZE: Hə Hə Hə.

slide-13
SLIDE 13

Available Resources + TRG-LRL and TRG-HRL Back- translation

LRL HRL TRG-L TRG-H LRL-M TRG-M TRG-M TRG -> LRL TRG-M HRL-M TRG -> HRL

slide-14
SLIDE 14

TRG

Augmentation via Pivoting

  • Problem: HRL-TRG data might suffer from lack
  • f lexical/syntactic overlap
  • Idea: Translate existing HRL-TRG data

○ Translate from HRL to LRL

LRL-H

HRL TRG HRL -> LRL

TUR: Çok teşekkür ederim. TRG: Thank you so much. AZE: Çox sağ olun. TRG: Thank you so much.

slide-15
SLIDE 15

Available Resources + TRG-LRL and TRG-HRL Back- translation + Pivoting

LRL HRL TRG-L TRG-H LRL-M TRG-M TRG-M TRG-M HRL-M TRG -> LRL TRG -> HRL LRL-H TRG-H HRL -> LRL

slide-16
SLIDE 16

Back-Translation by Pivoting

  • Problem: TRG-HRL back-translated 


data also suffers from lexical or
 syntactic mismatch


  • Idea: TRG-HRL-LRL

○ Large amount of English 
 monolingual data can be utilized

TRG-M

HRL -> LRL

TRG-M

TRG -> HRL HRL- M

LRL- MH

TRG: Thank you so much. TUR: Çok teşekkür ederim. TRG: Thank you so much. AZE: Çox sağ olun. TRG: Thank you so much.

slide-17
SLIDE 17

Data w/ Various Types of Pivoting

LRL HRL TRG-L TRG-H LRL-M TRG-M TRG-M LRL-H TRG-H LRL-MH TRG-M HRL-M TRG -> LRL TRG -> HRL HRL -> LRL HRL -> LRL

slide-18
SLIDE 18

Monolingual Data Copying

slide-19
SLIDE 19

TRG

Monolingual Data Copying

  • Problem: Back-translation may help with structure,

but fail at terminology

  • Idea: Use monolingual data as-is

○ Helps encourage the model to not drop words ○ Helps translation of terms that are identical across languages

TRG Copy

TRG: Thank you so much. SRC: Thank you so much. TRG: Thank you so much.

TRG

Anna Currey, Antonio Valerio Miceli Barone, Kenneth Heafield. Copied Monolingual Data Improves Low-Resource Neural Machine Translation. WMT 2018.

slide-20
SLIDE 20

Heuristic Augmentation Strategies

slide-21
SLIDE 21

Dictionary-based Augmentation

  • 1. Find rare words in the source sentences
  • 2. Use a language model to predict another word that could

appear in that context
 
 
 
 
 


  • 3. Replace, and aligned word with translation from dictionary

Marzieh Fadaee, Arianna Bisazza, Christof Monz. Data Augmentation for Low-Resource Neural Machine Translation. ACL 2017.

slide-22
SLIDE 22

An Aside: Word Alignment

  • Automatically find alignments between source and target words for

dictionary learning, analysis, supervised attention etc.

  • Traditional symbolic methods: word-based translation models

trained using EM algorithm

  • GIZA++: https://github.com/moses-smt/giza-pp
  • FastAlign: https://github.com/clab/fast_align
  • Neural methods: use model like multilingual BERT or translation

and find words with similar embeddings

  • SimAlign: https://github.com/cisnlp/simalign
slide-23
SLIDE 23

Word-by-word Data Augmentation

  • Even simpler, translate sentences word-by-word into

target sentence using dictionary

  • Problem: what about word ordering, syntactic

divergence?

J'ai acheté une nouvelle voiture I bought a new car 私 は 新しい ⾞ を 買った I the new car a bought

Lample, Guillaume, et al. "Unsupervised machine translation using monolingual corpora only." arXiv preprint arXiv:1711.00043 (2017).

slide-24
SLIDE 24

Word-by-word Augmentation w/ Reordering

Zhou, Chunting, et al. "Handling Syntactic Divergence in Low-resource Machine Translation." arXiv preprint arXiv:1909.00040 (2019).

  • Problem: Source-target word order can differ significantly in

methods that use monolingual pre-training

  • Solution: Do re-ordering according to grammatical rules, followed

by word-by-word translation to create pseudo-parallel data

slide-25
SLIDE 25

In-class Assignment

slide-26
SLIDE 26

In-class Assignment

  • Read one of the cited papers on heuristic data

augmentation

Marzieh Fadaee, Arianna Bisazza, Christof Monz. Data Augmentation for Low-Resource Neural Machine Translation. ACL 2017. Zhou, Chunting, et al. "Handling Syntactic Divergence in Low-resource Machine Translation." EMNLP 2019.

  • Try to think of how it would work for one of the languages

you're familiar with

  • Are there any potential hurdles to applying such a

method? Are there any improvements you can think of?