Multi-source Meta Transfer for Low Resource MCQA Ming Yan 1 , Hao - - PowerPoint PPT Presentation

β–Ά
multi source meta transfer for low resource mcqa
SMART_READER_LITE
LIVE PREVIEW

Multi-source Meta Transfer for Low Resource MCQA Ming Yan 1 , Hao - - PowerPoint PPT Presentation

ACL 2020 Multi-source Meta Transfer for Low Resource MCQA Ming Yan 1 , Hao Zhang 1,2 , Di Jin 3 , Joey Tianyi Zhou 1 1 IHPC, A*STAR, Singapore 2 CSCE, NTU, Singapore 3 CSAIL, MIT, USA 1 Background Low resource MCQA with date size under 100K


slide-1
SLIDE 1

1

Multi-source Meta Transfer for Low Resource MCQA

Ming Yan1, Hao Zhang1,2, Di Jin3, Joey Tianyi Zhou1

1 IHPC, A*STAR, Singapore 2 CSCE, NTU, Singapore 3 CSAIL, MIT, USA

ACL 2020

slide-2
SLIDE 2

SEARCHQA NEWSQA SWAG HOTPOTQA SQUAD RACE SEMEVAL DREAM MCTEST Data Size (K) 35 70 105 140 2.6 6.1 13.9 97.6 108 113 113.5 120 140

Background

2

Extractive/ Abstractive MCQA Multi-hop

Story Dialogue Narrative Text Exam Scenario Text Wikipedia Wikipedia Snippets Newswire Corpus from different domains Low resource MCQA with date size under 100K

slide-3
SLIDE 3

How does meta learning work?

3

Β§ Low resource setting Β§ Domains discrepancy

Transfer learning, multi-task learning Fine-tuning on the target domain

[Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017

𝐾 ∢ 𝑑𝑝𝑑𝑒 π‘”π‘£π‘œπ‘‘π‘’π‘—π‘π‘œ π‘—π‘œπ‘—π‘’ π‘₯! from π‘π‘π‘‘π‘™π‘π‘π‘œπ‘“ π‘›π‘π‘’π‘“π‘š 𝑧! = π‘›π‘π‘’π‘“π‘š"(π‘₯", 𝑦!) modπ‘“π‘š" =: 𝒅𝒑𝒒𝒛(π‘›π‘π‘’π‘“π‘š!) 𝑧" = π‘›π‘π‘’π‘“π‘š"(π‘₯", 𝑦")

FFL FFm BPL BPm

π‘₯" =: π‘₯" + 𝛽 πœ–πΎ" πœ–π‘₯" π‘₯! =: π‘₯! + 𝛽 πœ–πΎ! πœ–π‘₯!

FFL BPL

π‘‡π‘£π‘žπ‘žπ‘π‘ π‘’ 𝑒𝑏𝑑𝑙𝑑: 𝑦"~π‘Œ πΉπ‘œπ‘Ÿπ‘£π‘—π‘ π‘§ 𝑒𝑏𝑑𝑙𝑑: 𝑦!~π‘Œ

FFL BPL FFm BPm

fast adaption meta-learning FF: Feedforward BP: Backpropagation

Transfer Learning Multi-task Learning

Source 1 Target Source 3 Source 2

slide-4
SLIDE 4

How does meta learning work?

4

[Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017

𝐾 ∢ 𝑑𝑝𝑑𝑒 π‘”π‘£π‘œπ‘‘π‘’π‘—π‘π‘œ π‘—π‘œπ‘—π‘’ π‘₯! from preπ‘’π‘ π‘π‘—π‘œπ‘“π‘’ π‘›π‘π‘’π‘“π‘š π‘‡π‘£π‘žπ‘žπ‘π‘ π‘’ π‘ˆ: 𝑦"~π‘Œ πΉπ‘œπ‘Ÿπ‘£π‘—π‘ π‘§ π‘ˆ: 𝑦!~π‘Œ Learn a model that can generalize

  • ver the task distribution.

Dialogue Exam Story Narrative Text

source domains target domain

meta-learning learning/adaption

π‘₯!

βˆ‡!!𝐾"

βˆ‡!!𝐾# βˆ‡!!𝐾$

π‘₯!% π‘₯!& π‘₯!' fast adaption meta-learning π‘₯!% π‘₯!( π‘₯!& π‘₯!' fast adaption

same domain

4 choices 3 choices 4 choices 2 choice

slide-5
SLIDE 5

Multi-source Meta Transfer

5

Β§ Learn knowledge from multiple sources Β§ Reduce discrepancy between sources and target.

Meta Learning

Meta model

Multi-source Meta Transfer

1 3 2 Target MMT model

Task in source Task in source 2 Task in source 3 Task in source 1

Dialogue 3 choices Exam 4 choices Story 4 choices

Scenario Text 4 choice

slide-6
SLIDE 6

6

2 1 3 4

Representation space Input space MMT model

Supervised MMT

Task in source 2 Task in source 3 Task in source 4 Task in source 1 Source representation MMT representation MML MTL

Target Target

Multi-source Meta Transfer

Multi-source Meta Learning (MML)

Learn knowledge from multiple sources. Learn a representation near to the target.

4 3 1 2

Multi-source Transfer Learning (MTL)

Finetune meta-model to the target source.

slide-7
SLIDE 7

0.2 0.1 … 0.4 0.1 0.2 … 0.4 0.5 0.1 … 0.4 0.20.5 … 0.8 0.2 0.3 … 0.6 0.7 0.4 … 0.3 0.3 0.4 … 0.7 0.6 0.4 … 0.5 0.4 0.7 … 0.2

source 3

0.3 0.4 … 0.7 0.8 0.5 … 0.3 0.4 0.7 … 0.2 0.6 0.4 … 0.5

source 1

0.2 0.1 … 0.4 0.1 0.2 … 0.4 0.5 0.1 … 0.4

How MMT samples the task?

7

0.2 0.1 … 0.4 0.1 0.2 … 0.4 0.5 0.1 … 0.4 0.1 0.1 … 0.9 0.2 0.3 … 0.6 0.2 0.5 … 0.3 0.3 0.4 … 0.7 0.6 0.4 … 0.5 0.4 0.7 … 0.3

M Q A

target source 2

0.1 0.1 … 0.9 0.2 0.3 … 0.6 0.7 0.4 … 0.3 0.2 0.5 … 0.8 0.2 0.1 … 0.4 0.1 0.2 … 0.4 0.5 0.1 … 0.4 0.1 0.1 … 0.9 0.2 0.3 … 0.6 0.2 0.5 … 0.8 0.3 0.4 … 0.7 0.8 0.5 … 0.3 0.4 0.7 … 0.2

Algorithm 1: The procedure of MMT Input: Task distribution over source π‘ž) 𝜐 , data distribution over target 𝑄* 𝜐 , backbone model 𝑔 πœ„ , learning rates in MMT 𝛽, 𝛾, πœ‡ Output: Optimized parameters πœ„

Initial the value of πœ„ While not done do for all source 𝑇 do Sample batch of tasks 𝜐"

#~π‘ž# 𝜐

for all 𝜐"

# do

Evaluate 𝛼$𝑀%+

, 𝑔 πœ„

with respect to k examples Compute gradient for fast adaption: πœ„& =: πœ„ βˆ’ 𝛽𝛼$𝑀%+

, 𝑔 πœ„

end Meta model update: πœ„ =: πœ„ βˆ’ π›Ύβˆ‡$ βˆ‘%+

,~(,(%) 𝑀%+ , 𝑔 πœ„β€²

Get batch of data 𝜐"

+~π‘ž+ 𝜐

for all 𝜐"

+ do

Evaluate βˆ‡$𝑀%+

  • 𝑔 πœ„

with respect to k examples Gradient for target fine-tuning: πœ„ =: πœ„ βˆ’ π›Ύβˆ‡$𝑀%+

  • 𝑔 πœ„

end end end

Get all batches of data 𝜐"

+~π‘ž+ 𝜐

for all 𝜐"

+ do

Evaluate with respect to batch size; Gradient for meta transfer learning: πœ„ =: πœ„ βˆ’ π›Ύβˆ‡$𝑀%+

  • 𝑔 πœ„

end

Meta Tasks 𝑂# β‰₯ 𝑂$ βˆ€π·% ∈ 𝜐#

slide-8
SLIDE 8

8

Multi-source Meta Transfer

Algorithm 1: The procedure of MMT Input: Task distribution over source π‘ž) 𝜐 , data distribution over target 𝑄* 𝜐 , backbone model 𝑔 πœ„ , learning rates in MMT 𝛽, 𝛾, πœ‡ Output: Optimized parameters πœ„

Initial the value of πœ„ While not done do for all source 𝑇 do Sample batch of tasks 𝜐"

#~π‘ž# 𝜐

for all 𝜐"

# do

Evaluate 𝛼$𝑀%+

, 𝑔 πœ„

with respect to k examples Compute gradient for fast adaption: πœ„& =: πœ„ βˆ’ 𝛽𝛼$𝑀%+

, 𝑔 πœ„

end Meta model update: πœ„ =: πœ„ βˆ’ π›Ύβˆ‡$ βˆ‘%+

,~(,(%) 𝑀%+ , 𝑔 πœ„β€²

Get batch of data 𝜐"

+~π‘ž+ 𝜐

for all 𝜐"

+ do

Evaluate βˆ‡$𝑀%+

  • 𝑔 πœ„

with respect to k examples Gradient for target fine-tuning: πœ„ =: πœ„ βˆ’ π›Ύβˆ‡$𝑀%+

  • 𝑔 πœ„

end end end

Get all batches of data 𝜐"

+~π‘ž+ 𝜐

for all 𝜐"

+ do

Evaluate with respect to batch size; Gradient for meta transfer learning: πœ„ =: πœ„ βˆ’ π›Ύβˆ‡$𝑀%+

  • 𝑔 πœ„

end

d MMT is agnostic to backbone models S1 Target S2 S4 Target Target Target S3 MTL MML MTL d Transfer meta model to the target MML d Support task and Query task sampled from the same distribution Updated the learner (πœ„&) on support task Updated the meta model (πœ„) on query task Updated the meta model (πœ„) on target data

slide-9
SLIDE 9

Results

9 Performance of Supervised MMT MCTEST Performance of Unsupervised MMT MMT Ablation Study

slide-10
SLIDE 10

How to select sources?

10

Transferability Matrix Sources Targets Test on SemEval 2018 T-SNE Visualization of BERT Feature

100 random samples

slide-11
SLIDE 11

Takeaways

v MMT extends to meta learning to multi-source on MCQA task

11

v MMT provided an algorithm both for supervised and unsupervised meta training v MMT give a guideline to source selection