adaptive knowledge sharing in multi task learning
play

Adaptive Knowledge Sharing in Multi-Task Learning: Improving - PowerPoint PPT Presentation

Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation Poorya ZareMoodi , Wray Buntine, Gholamreza (Reza) Haffari Monash University Slides : Roadmap ! 2 Introduction & background Adaptive


  1. Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation Poorya ZareMoodi , Wray Buntine, Gholamreza (Reza) Haffari Monash University Slides :

  2. Roadmap ! 2 Introduction & background • Adaptive knowledge sharing in Multi-Task Learning • Experiments & analysis • Conclusion •

  3. Improving NMT in low-Resource scenarios ! 3 NMT is notorious! • Bilingually low-resource scenario: large amounts of bilingual training data is • not available IDEA: Use existing resources from other tasks and train one model for all tasks • using multi-task learning This effectively injects inductive biases to help improving the generalisation of • NMT Auxiliary tasks: Semantic Parsing, Syntactic Parsing, Named Entity Recognition •

  4. Encoders-Decoders for Individual Tasks ! 4 Machine Translation I went home متفر هناخ هب نم Encoder Decoder Semantic Parsing Obama was elected and Encoder Decoder his voter celebrated Syntactic Parsing apartment N The burglar robbed the NP Encoder Decoder DT the apartment VP robbed V S burglar Named-Entity Recognition N NP DT the Jim bought 300 shares of Encoder Decoder B-PER 0 0 0 0 B-ORG I-ORG 0 B-MISC Acme Corp. in 2006

  5. Sharing Scenario 
 ! 5 translation Named-entities sentence Multitask seq2seq Parse tree task tag Semantic graph Machine Translation Encoder Decoder Encoder Decoder Semantic Parsing Syntactic Parsing Encoder Decoder Named-Entity Encoder Decoder Recognition

  6. Partial Parameter Sharing ! 6 <translation> I went home متفر هناخ هب نم Encoder Decoder نمهبهناخمتفر <EOS> shared shared (3) (3) (3)(3)(3)(3) (3) (3) (3) (3) h 2 h 4 h 5 g 4 g 5 h 1 h 3 g 1 g 2 g 3 k e s c a n T e r e Context f r e (2) (2)(2)(2)(2) t (2) (2) (2) (2) (2) n h 5 h 1 h 2 h 3 h 4 g 4 g 5 g 1 g 2 g 3 i (1)(1)(1)(1) (1) (1) (1) (1) (1) (1) h 1 h 2 h 3 h 4 h 5 g 4 g 5 g 1 g 2 g 3 I went home <EOS> <translation> Zaremoodi & Haffari, NAACL, 2018

  7. Roadmap ! 7 Introduction & Background • Adaptive knowledge sharing in Multi-Task Learning • Experiments & analysis • Conclusion •

  8. Adaptive Knowledge Sharing in MTL ! 8 ! Sharing the parameters of the recurrent units among all tasks ! Task interference sharing the knowledge for ! Inability to leverage commonalities among subsets of tasks controlling the information flow in the hidden states ! IDEA ! Multiple experts in handling different kinds of information ! Adaptively share experts among the tasks

  9. Adaptive Knowledge Sharing in MTL ! 9 ! IDEA ! Multiple experts in handling different kinds of information ! Adaptively share experts among the tasks ! Extend the recurrent units with multiple blocks ! each block has its own information flow through the time ! Routing mechanism: to softly direct the input to these blocks Task Block

  10. Adaptive Knowledge Sharing ! 10 Routing : Blocks : 흉 푡 Task Block

  11. Adaptive Knowledge Sharing ! 11 We use the proposed recurrent unit inside encoder and decoder. <EOS> نمهبهناخمتفر Context (1) (1)(1)(1)(1) (1) (1) (1) (1) (1) h 1 h 2 h 3 h 4 h 5 g 4 g 5 g 1 g 3 g 2 I went home <EOS> <translation> 흉 푡 흉 푡 Task Task Block Block

  12. Roadmap ! 12 Introduction & background • Adaptive knowledge sharing in Multi-Task Learning • Experiments & analysis • Conclusion •

  13. Experiments ! 13 Language Pairs: English to Farsi/Vietnamese • Datasets: • English to Farsi: TED corpus & LDC2016E93 ▪ English to Vietnamese: IWSLT 2015 (TED and TEDX talks) ▪ Semantic parsing: AMR corpus(newswire, weblogs, web discussion forums and broadcast ▪ conversations) Syntactic parsing: Penn Treebank ▪ NER: CONLL NER Corpus (newswire articles from the Reuters Corpus) ▪ NMT Architecture: GRU for blocks, 400 RNN hidden states and word embedding • NMT best practice: • Optimisation: Adam ▪ Byte Pair Encoding (BPE) on both source/target ▪ Evaluation metrics: PPL, TER and BLEU ▪

  14. Experiments ! 14 BLEU English ➔ Farsi English ➔ Vietnamese

  15. Experiments (English to Farsi) ! 15 0.5 0.45 0.4 0.35 0.3 0.25 0.2 B lock 1 Block 2 Block 3 MT S emantic S yntac tic NE R ! Average block usage. ! Blocks specialisation: Block 1: MT, Semantic Parsing, Block 2: Syntactic/Semantic Parsing, Block 3: NER

  16. Conclusion ! 16 ! Address the task interference issue in MTL ! extending the recurrent units with multiple blocks ! with a trainable routing network

  17. ! 17 Questions? Paper :

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend