multi source meta transfer for low resource mcqa
play

Multi-source Meta Transfer for Low Resource MCQA Ming Yan 1 , Hao - PowerPoint PPT Presentation

ACL 2020 Multi-source Meta Transfer for Low Resource MCQA Ming Yan 1 , Hao Zhang 1,2 , Di Jin 3 , Joey Tianyi Zhou 1 1 IHPC, A*STAR, Singapore 2 CSCE, NTU, Singapore 3 CSAIL, MIT, USA 1 Background Low resource MCQA with date size under 100K


  1. ACL 2020 Multi-source Meta Transfer for Low Resource MCQA Ming Yan 1 , Hao Zhang 1,2 , Di Jin 3 , Joey Tianyi Zhou 1 1 IHPC, A*STAR, Singapore 2 CSCE, NTU, Singapore 3 CSAIL, MIT, USA 1

  2. Background Low resource MCQA with date size under 100K Corpus from different domains SEARCHQA 140 Snippets NEWSQA 120 Newswire SWAG 113.5 Scenario Text Extractive/ HOTPOTQA 113 Wikipedia Abstractive SQUAD 108 Wikipedia RACE 97.6 Exam Multi-hop SEMEVAL 13.9 Narrative Text DREAM 6.1 Dialogue MCQA MCTEST 2.6 Story 0 35 70 105 140 Data Size (K) 2

  3. How does meta learning work? Β§ Low resource setting Transfer learning, multi-task learning Β§ Domains discrepancy Fine-tuning on the target domain 𝐾 ∢ 𝑑𝑝𝑑𝑒 π‘”π‘£π‘œπ‘‘π‘’π‘—π‘π‘œ π‘—π‘œπ‘—π‘’ π‘₯ ! from π‘π‘π‘‘π‘™π‘π‘π‘œπ‘“ π‘›π‘π‘’π‘“π‘š Transfer Learning π‘‡π‘£π‘žπ‘žπ‘π‘ π‘’ 𝑒𝑏𝑑𝑙𝑑: 𝑦 " ~π‘Œ πΉπ‘œπ‘Ÿπ‘£π‘—π‘ π‘§ 𝑒𝑏𝑑𝑙𝑑: 𝑦 ! ~π‘Œ modπ‘“π‘š " =: 𝒅𝒑𝒒𝒛(π‘›π‘π‘’π‘“π‘š ! ) 𝑧 " = π‘›π‘π‘’π‘“π‘š " (π‘₯ " , 𝑦 " ) FF L FF L Multi-task Learning π‘₯ " =: π‘₯ " + 𝛽 πœ–πΎ " BP L BP L πœ–π‘₯ " fast adaption 𝑧 ! = π‘›π‘π‘’π‘“π‘š " (π‘₯ " , 𝑦 ! ) FF m π‘₯ ! =: π‘₯ ! + 𝛽 πœ–πΎ ! BP m FF: Feedforward πœ–π‘₯ ! BP: Backpropagation meta-learning Source 1 Source 2 Source 3 Target FF L BP L FF m BP m 3 [Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017

  4. How does meta learning work? π‘—π‘œπ‘—π‘’ π‘₯ ! from preπ‘’π‘ π‘π‘—π‘œπ‘“π‘’ π‘›π‘π‘’π‘“π‘š πΉπ‘œπ‘Ÿπ‘£π‘—π‘ π‘§ π‘ˆ: 𝑦 ! ~π‘Œ π‘‡π‘£π‘žπ‘žπ‘π‘ π‘’ π‘ˆ: 𝑦 " ~π‘Œ 𝐾 ∢ 𝑑𝑝𝑑𝑒 π‘”π‘£π‘œπ‘‘π‘’π‘—π‘π‘œ meta-learning learning/adaption π‘₯ ! source domains βˆ‡ ! ! 𝐾 $ Exam βˆ‡ ! ! 𝐾 # fast adaption Dialogue 4 choices βˆ‡ ! ! 𝐾 " 3 choices π‘₯ ! ' π‘₯ ! % π‘₯ ! & Story 4 choices target domain meta-learning π‘₯ ! % Narrative Text π‘₯ ! ( same domain 2 choice π‘₯ ! & fast adaption π‘₯ ! ' Learn a model that can generalize over the task distribution. [Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017 4

  5. Multi-source Meta Transfer Meta Learning Multi-source Meta Transfer Meta model MMT model Dialogue 3 choices Exam Target 2 4 choices 1 Story 4 choices Scenario Text 3 4 choice Task in source Task in source 1 Task in source 2 Task in source 3 Β§ Learn knowledge from multiple sources Β§ Reduce discrepancy between sources and target. 5

  6. Multi-source Meta Transfer Supervised MMT MMT model 4 1 Target 2 3 Representation space Target 4 1 2 3 Input space Task in source 1 Task in source 3 MMT representation MTL Task in source 2 Task in source 4 Source representation MML Learn knowledge from multiple sources. Multi-source Meta Learning (MML) Learn a representation near to the target. Multi-source Transfer Learning Finetune meta-model to the target source. (MTL) 6

  7. How MMT samples the task? Algorithm 1 : The procedure of MMT Input: Task distribution over source π‘ž ) 𝜐 , data distribution over M Q A target 𝑄 * 𝜐 , backbone model 𝑔 πœ„ , learning rates in MMT 𝛽, 𝛾, πœ‡ 0.1 0.2 … 0.4 Output: Optimized parameters πœ„ Initial the value of πœ„ 0.1 0.2 … 0.4 0.2 0.1 … 0.4 0.2 0.1 … 0.4 While not done do for all source 𝑇 do 0.2 0.1 … 0.4 0.1 0.2 … 0.4 0.5 0.1 … 0.4 # ~π‘ž # 𝜐 Sample batch of tasks 𝜐 " # do for all 𝜐 " 0.5 0.1 … 0.4 0.5 0.1 … 0.4 0.5 0.1 … 0.4 0.20.5 … 0.8 , 𝑔 πœ„ Evaluate 𝛼 $ 𝑀 % + with respect to k examples 0.1 0.2 … 0.4 source 1 0.2 0.3 … 0.6 Compute gradient for fast adaption: 0.1 0.1 … 0.9 πœ„ & =: πœ„ βˆ’ 𝛽𝛼 $ 𝑀 % + , 𝑔 πœ„ 0.7 0.4 … 0.3 0.2 0.1 … 0.4 0.2 0.3 … 0.6 end 0.1 0.1 … 0.9 0.2 0.5 … 0.3 Meta model update: 0.1 0.1 … 0.9 0.3 0.4 … 0.7 0.2 0.3 … 0.6 , 𝑔 πœ„β€² πœ„ =: πœ„ βˆ’ π›Ύβˆ‡ $ βˆ‘ % + , ~( , (%) 𝑀 % + 0.6 0.4 … 0.5 + ~π‘ž + 𝜐 0.2 0.5 … 0.8 0.7 0.4 … 0.3 Get batch of data 𝜐 " 0.3 0.4 … 0.7 + do for all 𝜐 " 0.4 0.7 … 0.2 0.2 0.3 … 0.6 0.2 0.5 … 0.8 0.4 0.7 … 0.3 Evaluate βˆ‡ $ 𝑀 % + - 𝑔 πœ„ with respect to k examples source 2 0.6 0.4 … 0.5 Gradient for target fine-tuning: 0.8 0.5 … 0.3 πœ„ =: πœ„ βˆ’ π›Ύβˆ‡ $ 𝑀 % + - 𝑔 πœ„ 0.3 0.4 … 0.7 0.3 0.4 … 0.7 end end 0.4 0.7 … 0.2 0.8 0.5 … 0.3 end + ~π‘ž + 𝜐 Get all batches of data 𝜐 " Meta Tasks 0.4 0.7 … 0.2 + do for all 𝜐 " 0.6 0.4 … 0.5 𝑂 # β‰₯ 𝑂 $ Evaluate with respect to batch size; target Gradient for meta transfer learning: source 3 βˆ€π· % ∈ 𝜐 # πœ„ =: πœ„ βˆ’ π›Ύβˆ‡ $ 𝑀 % + - 𝑔 πœ„ 7 end

  8. Multi-source Meta Transfer Algorithm 1 : The procedure of MMT Input: Task distribution over source π‘ž ) 𝜐 , data distribution over target 𝑄 * 𝜐 , backbone model 𝑔 πœ„ , learning rates in MMT 𝛽, 𝛾, πœ‡ MMT is agnostic Output: Optimized parameters πœ„ d to backbone models Initial the value of πœ„ While not done do for all source 𝑇 do # ~π‘ž # 𝜐 Support task and Query task sampled Sample batch of tasks 𝜐 " # do for all 𝜐 " from the same distribution , 𝑔 πœ„ Evaluate 𝛼 $ 𝑀 % + with respect to k examples Compute gradient for fast adaption: Updated the learner ( πœ„ & ) on support task πœ„ & =: πœ„ βˆ’ 𝛽𝛼 $ 𝑀 % + , 𝑔 πœ„ end Meta model update: Updated the meta model ( πœ„ ) on query task d πœ„ =: πœ„ βˆ’ π›Ύβˆ‡ $ βˆ‘ % + , ~( , (%) 𝑀 % + , 𝑔 πœ„β€² + ~π‘ž + 𝜐 Get batch of data 𝜐 " Updated the meta model ( πœ„ ) on target data + do for all 𝜐 " Evaluate βˆ‡ $ 𝑀 % + - 𝑔 πœ„ with respect to k examples Gradient for target fine-tuning: S2 Target Target πœ„ =: πœ„ βˆ’ π›Ύβˆ‡ $ 𝑀 % + - 𝑔 πœ„ MML end MML end S1 S3 end + ~π‘ž + 𝜐 Get all batches of data 𝜐 " + do for all 𝜐 " Target S4 Target MTL Evaluate with respect to batch size; d Gradient for meta transfer learning: πœ„ =: πœ„ βˆ’ π›Ύβˆ‡ $ 𝑀 % + - 𝑔 πœ„ Transfer meta model to the target MTL 8 end

  9. Results Performance of Supervised MMT 9 MCTEST Performance of Unsupervised MMT MMT Ablation Study

  10. How to select sources? T-SNE Visualization of BERT Feature 100 random samples Targets Sources Test on SemEval 2018 Transferability Matrix 10

  11. Takeaways v MMT extends to meta learning to multi-source on MCQA task v MMT provided an algorithm both for supervised and unsupervised meta training v MMT give a guideline to source selection 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend