Transfer learning with neural language models
CS 685, Spring 2020
Advanced Natural Language Processing
Mohit Iyyer
College of Information and Computer Sciences University of Massachusetts Amherst
many slides from Jacob Devlin & Matt Peters
Transfer learning with neural language models CS 685, Spring 2020 - - PowerPoint PPT Presentation
Transfer learning with neural language models CS 685, Spring 2020 Advanced Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst many slides from Jacob Devlin & Matt Peters
CS 685, Spring 2020
Advanced Natural Language Processing
Mohit Iyyer
College of Information and Computer Sciences University of Massachusetts Amherst
many slides from Jacob Devlin & Matt Peters
Overleaf template
computationally feasible on Colab, look out for it next week
material / etc) in the chatbox!
2
3
need?
task for which it is easy to generate labels, and adapt it to a different task for which it is harder.
ImageNet, transfer its representations to every other CV task
task!
4
5
A ton of unlabeled text
A huge self- supervised model
step 1: unsupervised pretraining
Sentiment- specialized model
Labeled reviews from IMDB
step 2: supervised fine-tuning
Deep contextualized word representations. Peters et al., NAACL 2018
playing plays game player games Play played football players multiplayer
playing plays game player games Play played football players multiplayer
playing plays game player games Play played football players multiplayer
playing plays game player games Play played football players multiplayer
15
16
Examples on iPad
17
Deep bidirectional language model … download new games or play ??
… download new games or play ?? Deep bidirectional language model
LSTM
Deep bidirectional language model … download new games or play ??
LSTM LSTM LSTM
Deep bidirectional language model … download new games or play ??
LSTM LSTM LSTM LSTM LSTM
Deep bidirectional language model … download new games or play ??
LSTM LSTM LSTM LSTM LSTM LSTM LSTM
Deep bidirectional language model … download new games or play ??
LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM
Deep bidirectional language model … download new games or play ??
LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM
?? Deep bidirectional language model … download new games or play ??
biLSTM biLSTM biLSTM biLSTM biLSTM biLSTM
… games or play online via … Use all layers of language model
0.25 0.6
embeddings from language models
0.15
Learned task-specific combination of layers
biLSTM biLSTM biLSTM biLSTM biLSTM biLSTM
… games or play online via …
s3 s2
embeddings from language models
s1
how many different embeddings does ELMo compute for a given word?
Large-scale recurrent neural language models learn contextual representations that capture basic elements of semantics and syntax Adding ELMo to existing state-of-the-art models provides significant performance improvement on all NLP tasks.
TO FROM
32
33
34
35
36
cons of increasing k?
37
38
be unimportant (and can be removed e.g., in RoBERTa)
39
40
41
42
43
next week!
44
45
46
47
48
49