Adaptive Knowledge Sharing in Multi-Task Learning: Improving - PowerPoint PPT Presentation

Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation Poorya ZareMoodi , Wray Buntine, Gholamreza (Reza) Haffari Monash University Slides :

Roadmap ! 2 Introduction & background • Adaptive knowledge sharing in Multi-Task Learning • Experiments & analysis • Conclusion •

Improving NMT in low-Resource scenarios ! 3 NMT is notorious! • Bilingually low-resource scenario: large amounts of bilingual training data is • not available IDEA: Use existing resources from other tasks and train one model for all tasks • using multi-task learning This effectively injects inductive biases to help improving the generalisation of • NMT Auxiliary tasks: Semantic Parsing, Syntactic Parsing, Named Entity Recognition •

Encoders-Decoders for Individual Tasks ! 4 Machine Translation I went home متفر هناخ هب نم Encoder Decoder Semantic Parsing Obama was elected and Encoder Decoder his voter celebrated Syntactic Parsing apartment N The burglar robbed the NP Encoder Decoder DT the apartment VP robbed V S burglar Named-Entity Recognition N NP DT the Jim bought 300 shares of Encoder Decoder B-PER 0 0 0 0 B-ORG I-ORG 0 B-MISC Acme Corp. in 2006

Sharing Scenario   ! 5 translation Named-entities sentence Multitask seq2seq Parse tree task tag Semantic graph Machine Translation Encoder Decoder Encoder Decoder Semantic Parsing Syntactic Parsing Encoder Decoder Named-Entity Encoder Decoder Recognition

Partial Parameter Sharing ! 6 <translation> I went home متفر هناخ هب نم Encoder Decoder نمهبهناخمتفر <EOS> shared shared (3) (3) (3)(3)(3)(3) (3) (3) (3) (3) h 2 h 4 h 5 g 4 g 5 h 1 h 3 g 1 g 2 g 3 k e s c a n T e r e Context f r e (2) (2)(2)(2)(2) t (2) (2) (2) (2) (2) n h 5 h 1 h 2 h 3 h 4 g 4 g 5 g 1 g 2 g 3 i (1)(1)(1)(1) (1) (1) (1) (1) (1) (1) h 1 h 2 h 3 h 4 h 5 g 4 g 5 g 1 g 2 g 3 I went home <EOS> <translation> Zaremoodi & Haffari, NAACL, 2018

Roadmap ! 7 Introduction & Background • Adaptive knowledge sharing in Multi-Task Learning • Experiments & analysis • Conclusion •

Adaptive Knowledge Sharing in MTL ! 8 ! Sharing the parameters of the recurrent units among all tasks ! Task interference sharing the knowledge for ! Inability to leverage commonalities among subsets of tasks controlling the information flow in the hidden states ! IDEA ! Multiple experts in handling different kinds of information ! Adaptively share experts among the tasks

Adaptive Knowledge Sharing in MTL ! 9 ! IDEA ! Multiple experts in handling different kinds of information ! Adaptively share experts among the tasks ! Extend the recurrent units with multiple blocks ! each block has its own information flow through the time ! Routing mechanism: to softly direct the input to these blocks Task Block

Adaptive Knowledge Sharing ! 10 Routing : Blocks : 흉 푡 Task Block

Adaptive Knowledge Sharing ! 11 We use the proposed recurrent unit inside encoder and decoder. <EOS> نمهبهناخمتفر Context (1) (1)(1)(1)(1) (1) (1) (1) (1) (1) h 1 h 2 h 3 h 4 h 5 g 4 g 5 g 1 g 3 g 2 I went home <EOS> <translation> 흉 푡 흉 푡 Task Task Block Block

Roadmap ! 12 Introduction & background • Adaptive knowledge sharing in Multi-Task Learning • Experiments & analysis • Conclusion •

Experiments ! 13 Language Pairs: English to Farsi/Vietnamese • Datasets: • English to Farsi: TED corpus & LDC2016E93 ▪ English to Vietnamese: IWSLT 2015 (TED and TEDX talks) ▪ Semantic parsing: AMR corpus(newswire, weblogs, web discussion forums and broadcast ▪ conversations) Syntactic parsing: Penn Treebank ▪ NER: CONLL NER Corpus (newswire articles from the Reuters Corpus) ▪ NMT Architecture: GRU for blocks, 400 RNN hidden states and word embedding • NMT best practice: • Optimisation: Adam ▪ Byte Pair Encoding (BPE) on both source/target ▪ Evaluation metrics: PPL, TER and BLEU ▪

Experiments ! 14 BLEU English ➔ Farsi English ➔ Vietnamese

Experiments (English to Farsi) ! 15 0.5 0.45 0.4 0.35 0.3 0.25 0.2 B lock 1 Block 2 Block 3 MT S emantic S yntac tic NE R ! Average block usage. ! Blocks specialisation: Block 1: MT, Semantic Parsing, Block 2: Syntactic/Semantic Parsing, Block 3: NER

Conclusion ! 16 ! Address the task interference issue in MTL ! extending the recurrent units with multiple blocks ! with a trainable routing network

! 17 Questions? Paper :

Adaptive Knowledge Sharing in Multi-Task Learning: Improving - PowerPoint PPT Presentation

Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation Poorya ZareMoodi , Wray Buntine, Gholamreza (Reza) Haffari Monash University Slides : Roadmap ! 2 Introduction & background Adaptive

ESCRI-SA Knowledge Sharing Sharing Objectives and Components A presentation for the ESCRI-SA

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Secret Sharing and Visual Cryptography Outline Secret Sharing Visual Secret Sharing

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Adaptive Adversarial Multi-task Representation Learning Yuren Mao 1 Weiwei Liu 2 Xuemin Lin 1 1.

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Advanced Tools from Modern Cryptography Lecture 3 Secret-Sharing (ctd.) Secret-Sharing Last

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Multi-Task Learning and Matrix Regularization Andreas Argyriou TTI Chicago Outline

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

NOTE: This disposition is nonprecedential. United States Court of Appeals for the Federal Circuit

Q4/11 Results Presentation. For smartphone and tablet users: just scan the QR-code and

NSRP ETP Project Enhanced Fiber Optic Testing for Cost Reduction Presented 12/10/2015 San

uMngeni Ecological Infrastructure Partnership Dr Sean ODonoghue (eThekwini Municipality

Alpha Presentation Reducing Shoplifting Using Machine Learning The Capstone Experience Team

Title Arial Bold 34pt font Counseling Workforce Subtitle Arial 25pt font Review of

INTRODUCTION JCISD PowerSchool T eam Elissa Talsma Caleb Forner Dennis Phillips What

Proctoring & Identity Verification CHARLES PERKINS & ANGE SULLIVAN Cheating Identity

Adaptive Knowledge Sharing in Multi-Task Learning: Improving - PowerPoint PPT Presentation

Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation Poorya ZareMoodi , Wray Buntine, Gholamreza (Reza) Haffari Monash University Slides : Roadmap ! 2 Introduction & background Adaptive

ESCRI-SA Knowledge Sharing Sharing Objectives and Components A presentation for the ESCRI-SA

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Secret Sharing and Visual Cryptography Outline Secret Sharing Visual Secret Sharing

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Adaptive Adversarial Multi-task Representation Learning Yuren Mao 1 Weiwei Liu 2 Xuemin Lin 1 1.

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Advanced Tools from Modern Cryptography Lecture 3 Secret-Sharing (ctd.) Secret-Sharing Last

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Multi-Task Learning and Matrix Regularization Andreas Argyriou TTI Chicago Outline

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

NOTE: This disposition is nonprecedential. United States Court of Appeals for the Federal Circuit

Q4/11 Results Presentation. For smartphone and tablet users: just scan the QR-code and

NSRP ETP Project Enhanced Fiber Optic Testing for Cost Reduction Presented 12/10/2015 San

uMngeni Ecological Infrastructure Partnership Dr Sean ODonoghue (eThekwini Municipality

Alpha Presentation Reducing Shoplifting Using Machine Learning The Capstone Experience Team

Title Arial Bold 34pt font Counseling Workforce Subtitle Arial 25pt font Review of

INTRODUCTION JCISD PowerSchool T eam Elissa Talsma Caleb Forner Dennis Phillips What

Proctoring &amp; Identity Verification CHARLES PERKINS &amp; ANGE SULLIVAN Cheating Identity

Proctoring & Identity Verification CHARLES PERKINS & ANGE SULLIVAN Cheating Identity