Competence-based Curriculum Learning for Neural Machine Transla:on
Anthony Platanios
e.a.platanios@cs.cmu.edu
Competence-based Curriculum Learning for Neural Machine Transla:on - - PowerPoint PPT Presentation
Competence-based Curriculum Learning for Neural Machine Transla:on Anthony Platanios e.a.platanios@cs.cmu.edu Joint work with O,lia Stretcu, Graham Neubig, Barnabas Poczos, and Tom Mitchell Neural Machine Transla/on (NMT) NMT represents the
e.a.platanios@cs.cmu.edu
2
[Popel 2018]
3
Training Time
Easy
Thank you!
Hard
Thank you, for being so pa:ent today and coming to this talk even though you’re probably :red!
Training Example
Thank you, for being so pa:ent!
Medium
4
Training Time
Easy Hard
Thank you! Thank you, for being so pa:ent! Thank you, for being so pa:ent today and coming to this talk even though you’re probably :red!
Training Example
Medium
5
Discrete regimes. Improvements in training Bme! No improvements in performance!
Training Time
Easy Hard
Thank you! Thank you, for being so pa:ent! Thank you, for being so pa:ent today and coming to this talk even though you’re probably :red!
Training Example
Medium
6
(e.g., sentence length) Training Example
Training Step (e.g., valida/on set performance)
7
CURRICULUM LEARNING
DIFFICULTY
Use sample only if: diffjculty(sample) ≤ competence(model)
COMPETENCE MODEL TRAINER DATA
SAMPLE MODEL STATE
8
CURRICULUM LEARNING
DIFFICULTY
Use sample only if: diffjculty(sample) ≤ competence(model)
COMPETENCE MODEL TRAINER DATA
SAMPLE MODEL STATE
CURRICULUM LEARNING
DIFFICULTY
Use sample only if: diffjculty(sample) ≤ competence(model)
COMPETENCE MODEL TRAINER DATA
SAMPLE MODEL STATE
9
10
11
Thank you very much! 4 Barack Obama loves ... 13 My name is ... 6 What did she say ... 123 Sentence Length Thank you very much! 0.01 Barack Obama loves ... 0.15 My name is ... 0.03 What did she say ... 0.95 Sentence Diffjculty
12
Thank you very much! 4 Barack Obama loves ... 13 My name is ... 6 What did she say ... 123 Sentence Length Thank you very much! 0.01 Barack Obama loves ... 0.15 My name is ... 0.03 What did she say ... 0.95 Sentence Diffjculty
13
0.5
Thank you very much! 4 Barack Obama loves ... 13 My name is ... 6 What did she say ... 123 Sentence Length Thank you very much! 0.01 Barack Obama loves ... 0.15 My name is ... 0.03 What did she say ... 0.95 Sentence Diffjculty
14
15
16
17
18
19
Diffjculty
Step 1000
Competence
Competence at current step Sample uniformly from blue region
Step 10000
20
21
22
200 400 600 800 1000
Time
0.0 0.2 0.4 0.6 0.8 1.0
Competence
ini/al competence /me a?er which the learner is fully competent
23
E.g., valida:on set performance.
Too Expensive!
24
Keep the rate in which new examples come in, inversely propor:onal to the training data size:
200 400 600 800 1000
Time
0.0 0.2 0.4 0.6 0.8 1.0
Competence clinear csqrt cr
cr
cr
200 400 600 800 1000
Time
0.0 0.2 0.4 0.6 0.8 1.0
Competence clinear csqrt cr
cr
cr
25
Keep the rate in which new examples come in, inversely propor:onal to the training data size:
200 400 600 800 1000
Time
0.0 0.2 0.4 0.6 0.8 1.0
Competence clinear csqrt cr
cr
cr
200 400 600 800 1000
Time
0.0 0.2 0.4 0.6 0.8 1.0
Competence clinear csqrt cr
cr
cr
26
CURRICULUM LEARNING
DIFFICULTY
Use sample only if: diffjculty(sample) ≤ competence(model)
COMPETENCE MODEL TRAINER DATA
SAMPLE MODEL STATE
DIFFICULTY
COMPETENCE
27
28
29
5000 10000 Step 15 20 25 30 BLEU
26.00 27.50
RNN 50000 100000 Step 15 20 25 30
28.00 30.00
Transformer Plain SL Linear SL Sqrt SR Linear SR Sqrt
30
Plain SL Linear SL Sqrt SR Linear SR Sqrt 10000 20000 Step 20 25 30 35 BLEU
31.00 32.00
RNN 50000 100000 Step 20 25 30 35
34.00 36.00
Transformer
31
100000 Step 15 20 25 30 BLEU
25.50 26.50
RNN 100000 200000 Step 15 20 25 30
28.00 30.00
Transformer
Plain SL Linear SL Sqrt SR Linear SR Sqrt
32
IWSLT15: En → Vi Transformer
33
RNN IWSLT15: En → Vi IWSLT16: En → De WMT16: En → De
Transformer
34
35
Prior work has not evaluated curriculum learning applied to Transformers.
e.a.platanios@cs.cmu.edu