Meta-Learning for Low Resource NMT Introduction Historically - - PowerPoint PPT Presentation

meta learning for low resource nmt introduction
SMART_READER_LITE
LIVE PREVIEW

Meta-Learning for Low Resource NMT Introduction Historically - - PowerPoint PPT Presentation

Meta-Learning for Low Resource NMT Introduction Historically Statistical Translation Neural Machine Translation recently outperforms Statistical Models outperformed translations on low resource language pairs NMT Previous Work


slide-1
SLIDE 1

Meta-Learning for Low Resource NMT

slide-2
SLIDE 2

Introduction

  • Historically Statistical Translation
  • Neural Machine Translation recently outperforms
  • Statistical Models outperformed translations on low resource language pairs
slide-3
SLIDE 3

NMT Previous Work

Single Task Mixed Datasets Monolingual Corpora Direct Transfer Learning

slide-4
SLIDE 4

Meta Learning in NMT

Idea:

Improve on direct transfer learning by better fine-tuning

slide-5
SLIDE 5

Danish French Spanish Greek Greek Polish Portuguese Italian 17 High-Resource Languages 4 Low-resource Languages Turkish Romanian Latvian Finnish

MAML for NMT

slide-6
SLIDE 6

Danish French Spanish Greek Greek Polish Portuguese Italian 17 High-Resource Languages 4 Low-resource Languages Turkish Romanian Latvian Finnish

Meta-train on these! Meta-test on these!

e.g. Spanish English e.g. Turkish English Note: they simulate low- resource by sub- sampling

slide-7
SLIDE 7

Gradient Update Meta-Gradient Update

slide-8
SLIDE 8

Gradient Update 1st-order Approximate Meta- Gradient Update Meta-Gradient Update

slide-9
SLIDE 9

Issue: Meta-train and meta-test input spaces should match!

Meta-train Meta-test

En un lugar de la mancha, de cuyo nombre no puedo... In some place in the Mancha, whose name... Benim adım kırmızı... My name is Red…

Spanish Word Embeddings Turkish Word Embeddings Spanish Embedding for nombre Turkish embedding for adım

Trained independently

slide-10
SLIDE 10

Universal Lexical Representation

Spanish Word Embeddings English Word Embeddings French Word Embeddings Turkish Word Embeddings

Word embeddings trained independently on monolingual corpora

slide-11
SLIDE 11

Universal Lexical Representation

Spanish Word Embeddings English Word Embeddings French Word Embeddings Turkish Word Embeddings

Word embeddings trained independently on monolingual corpora

Universal Embedding Values Universal Embedding Keys

slide-12
SLIDE 12

Universal Lexical Representation

Universal Embedding Keys (transposed) Universal Embedding Values

nombre

Spanish Word Embeddings Transformation Matrix

Key: We represent “nombre” as a linear combination of tokens in the ULR! And, these are the weights of the linear combination!

slide-13
SLIDE 13

Universal Lexical Representation

Universal Embedding Keys (transposed) Universal Embedding Values

adım

Turkish Word Embeddings Transformation Matrix

Same embedding space as Spanish!

slide-14
SLIDE 14

Training

Universal Embedding Keys (transposed) Universal Embedding Values

nombre

Spanish Word Embeddings Transformation Matrix

trainable fixed

slide-15
SLIDE 15

Experiments

slide-16
SLIDE 16

Comment: Best to leave the decoder be! Why?

Experiments

slide-17
SLIDE 17
slide-18
SLIDE 18

Comment: Gap narrows as more training examples are included

slide-19
SLIDE 19

Critique: Don’t evaluate on any real low-resource languages! Critique: Don’t know how many training examples per task? k-shot, but what is k?