Meta-Learning for Low Resource NMT Introduction Historically - - PowerPoint PPT Presentation
Meta-Learning for Low Resource NMT Introduction Historically - - PowerPoint PPT Presentation
Meta-Learning for Low Resource NMT Introduction Historically Statistical Translation Neural Machine Translation recently outperforms Statistical Models outperformed translations on low resource language pairs NMT Previous Work
Introduction
- Historically Statistical Translation
- Neural Machine Translation recently outperforms
- Statistical Models outperformed translations on low resource language pairs
NMT Previous Work
Single Task Mixed Datasets Monolingual Corpora Direct Transfer Learning
Meta Learning in NMT
Idea:
Improve on direct transfer learning by better fine-tuning
Danish French Spanish Greek Greek Polish Portuguese Italian 17 High-Resource Languages 4 Low-resource Languages Turkish Romanian Latvian Finnish
MAML for NMT
Danish French Spanish Greek Greek Polish Portuguese Italian 17 High-Resource Languages 4 Low-resource Languages Turkish Romanian Latvian Finnish
Meta-train on these! Meta-test on these!
e.g. Spanish English e.g. Turkish English Note: they simulate low- resource by sub- sampling
Gradient Update Meta-Gradient Update
Gradient Update 1st-order Approximate Meta- Gradient Update Meta-Gradient Update
Issue: Meta-train and meta-test input spaces should match!
Meta-train Meta-test
En un lugar de la mancha, de cuyo nombre no puedo... In some place in the Mancha, whose name... Benim adım kırmızı... My name is Red…
Spanish Word Embeddings Turkish Word Embeddings Spanish Embedding for nombre Turkish embedding for adım
Trained independently
Universal Lexical Representation
Spanish Word Embeddings English Word Embeddings French Word Embeddings Turkish Word Embeddings
Word embeddings trained independently on monolingual corpora
Universal Lexical Representation
Spanish Word Embeddings English Word Embeddings French Word Embeddings Turkish Word Embeddings
Word embeddings trained independently on monolingual corpora
Universal Embedding Values Universal Embedding Keys
Universal Lexical Representation
Universal Embedding Keys (transposed) Universal Embedding Values
nombre
Spanish Word Embeddings Transformation Matrix
Key: We represent “nombre” as a linear combination of tokens in the ULR! And, these are the weights of the linear combination!
Universal Lexical Representation
Universal Embedding Keys (transposed) Universal Embedding Values
adım
Turkish Word Embeddings Transformation Matrix
Same embedding space as Spanish!
Training
Universal Embedding Keys (transposed) Universal Embedding Values
nombre
Spanish Word Embeddings Transformation Matrix
trainable fixed
Experiments
Comment: Best to leave the decoder be! Why?
Experiments
Comment: Gap narrows as more training examples are included
Critique: Don’t evaluate on any real low-resource languages! Critique: Don’t know how many training examples per task? k-shot, but what is k?