intermediate task transfer
CS685 Fall 2020
Advanced Natural Language Processing
Mohit Iyyer
College of Information and Computer Sciences University of Massachusetts Amherst many slides from Tu Vu
intermediate task transfer CS685 Fall 2020 Advanced Natural Language - - PowerPoint PPT Presentation
intermediate task transfer CS685 Fall 2020 Advanced Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst many slides from Tu Vu Stu ff from last time Too many readings!
Advanced Natural Language Processing
College of Information and Computer Sciences University of Massachusetts Amherst many slides from Tu Vu
Company’s Cloud Service
Submits a new task
Task description Sample data
Returns a structure among tasks
end user
end user’s task related tasks
efficient supervision policies
Single Sentence Classification Sentence Pair Classification Machine Comprehension Sequence Labeling Unsupervised Learning Probing Tasks CoLA MRPC SQuAD CCG LM SentLen SST-2 STS-B NewsQA POS autoencoding WC 20 Newsgroups QQP SearchQA Chunk next sentence TreeDepth TREC-6 MNLI TriviaQA NER real/fake TopConst IMDB QNLI HotpotQA ST discourse relations BShift Yelp-2 RTE CQ GED … Tense Yelp-full WNLI CWQ PS SubjNum AG BoolQ ComQA EF ObjNum DBPedia CB WikiHOP Parent SOMO Sogou News WiC DROP Conj CoordInv … … … … …
QQP
SST-2
RTE
MRPC
QNLI
WNLI
STS-B
CoLA
base network
Tok 1 Tok 2 Tok N …
task description task embedding
base network
Tok 1 Tok 2 Tok N …
input text task-specific classifier layer
given task
through the network only once
either use training labels or sample from the model’s predictive distribution to compute gradients w.r.t. the model’s parameters (weights) or
activations weights
word embedding position embedding segment embedding LayerNorm
queries values keys
dense LayerNorm
MH 1 MH 2 MH N
… dense dense LayerNorm
L 1 L 2 L N
… dense
P
Multi-head Attention Output Layer Output Pooled Output Embedding Layer Encoder Layer N x Pooler Layer Multi-head Attention Feed Forward
selected source task
resulting model
compute a task embedding from BERT’s layer-wise gradients
similar source task embedding from a precomputed library
SQuAD SST2 DROP MNLI QNLI POS-PTB CCG WikiHop
WikiHop Target task
selected source task
resulting model
compute a task embedding from BERT’s layer-wise gradients
similar source task embedding from a precomputed library
SQuAD SST2 DROP MNLI QNLI POS-PTB CCG WikiHop
WikiHop Target task
selected source task
resulting model
compute a task embedding from BERT’s layer-wise gradients
similar source task embedding from a precomputed library
SQuAD SST2 DROP MNLI QNLI POS-PTB CCG WikiHop
WikiHop Target task
selected source task
resulting model
compute a task embedding from BERT’s layer-wise gradients
similar source task embedding from a precomputed library
SQuAD SST2 DROP MNLI QNLI POS-PTB CCG WikiHop
WikiHop Target task
CoLA SST-2 MRPC STS-B QQP MNLI QNLI RTE WNLI SNLI SciTail SQuAD-1.1 SQuAD-2.0 NewsQA HotpotQA BoolQ DROP WikiHop DuoRC-p DuoRC-s CQ ComQA CCG POS-PTB POS-EWT Parent GParent GGParent ST Chunk NER GED Conj
WNLI CoLA RTE MRPC MNLI STS-B QQP SNLI QNLI SST-2 SciTail DROP CQ DuoRC-p WikiHop ComQA DuoRC-s NewsQA BoolQ HotpotQA SQuAD-2 SQuAD-1 GED Conj GGParent GParent NER Parent CCG ST POS-EWT POS-PTB Chunk 20 40 60 80
Target task performance
CR tasks QA tasks SL tasks baseline (no transfer) task chosen by TaskEmb
LIMITED → LIMITED
SQuAD-2 is no longer the best source task for any QA targets in this regime QA tasks are good sources for CR targets
T a r g e t t a s k
B e s t s
r c e T a s k E m b c h
c e
Chunk SNLI SQuAD-1 SNLI CQ SNLI SciTail STS-B CQ SNLI ComQA QNLI ComQA MRPC STS-B MNLI NewsQA SQuAD-2 WikiHop SNLI ComQA QNLI CCG HotpotQA HotpotQA HotpotQA NewsQA HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA NewsQA HotpotQA HotpotQA HotpotQA POS-PTB SQuAD-1 NewsQA SQuAD-
WNLI CoLA RTE MRPC MNLI STS-B QQP SNLI QNLI SST-2 SciTail DROP CQ DuoRC-p WikiHop ComQA DuoRC-s NewsQA BoolQ Hotpot 20 40 60 80
Target task performance
LIMITED → LIMITED
SQuAD-2 is no longer the best source task for any QA targets in this regime QA tasks are good sources for CR targets
T a r g e t t a s k
B e s t s
r c e T a s k E m b c h
c e
A QNLI CCG HotpotQA HotpotQA HotpotQA NewsQA HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA NewsQA HotpotQA HotpotQA HotpotQA POS-PTB SQuAD-1 NewsQA SQuAD-1 NER HotpotQA NewsQA HotpotQA DuoRC-p GParent GGParent GParent GParent GParent GGParent GGParent HotpotQA POS-PTB Chunk GParent POS-PTB POS-PTB POS-PTB POS-PTB POS-PTB POS-PTB ST ST Parent POS-PTB
iTail DROP CQ DuoRC-p WikiHop ComQA DuoRC-s NewsQA BoolQ HotpotQA SQuAD-2 SQuAD-1 GED Conj GGParent GParent NER Parent CCG ST POS-EWT POS-PTB Chunk
CR tasks QA tasks SL tasks baseline (no transfer) task chosen by TaskEmb
LIMITED → LIMITED
SQuAD-2 is no longer the best source task for any QA targets in this regime
DuoRC-p GParent GGParent GParent GParent GParent GGParent GGParent HotpotQA POS-PTB Chunk GParent POS-PTB POS-PTB POS-PTB POS-PTB POS-PTB POS-PTB ST ST Parent POS-PTB
SQuAD-1 GED Conj GGParent GParent NER Parent CCG ST POS-EWT POS-PTB Chunk
CR tasks QA tasks SL tasks baseline (no transfer) task chosen by TaskEmb