Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation
Poorya ZareMoodi, Wray Buntine, Gholamreza (Reza) Haffari Monash University
Slides:
Adaptive Knowledge Sharing in Multi-Task Learning: Improving - - PowerPoint PPT Presentation
Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation Poorya ZareMoodi , Wray Buntine, Gholamreza (Reza) Haffari Monash University Slides : Roadmap ! 2 Introduction & background Adaptive
Poorya ZareMoodi, Wray Buntine, Gholamreza (Reza) Haffari Monash University
Slides:
S NP DT the N burglar VP V robbed NP DT the N apartment
B-PER 0 0 0 0 B-ORG I-ORG 0 B-MISC
Machine Translation
Syntactic Parsing
Named-Entity Recognition Semantic Parsing
h5
h2 h1
h4 h3
h2 h1 h4 h3 h2 h1 h4 h3
g1 g2 g3 g1 g2 g3 g1 g2 g3 g4 g4 g4
(1)(1)(1)(1) (2)(2)(2)(2) (3)(3)(3)(3)
g5 g5 g5
(1) (1) (1) (1) (1) (2) (2) (2) (2) (2) (3) (3) (3) (3) (3)
Context
<translation> I went homeمتفر هناخ هب نم
Zaremoodi & Haffari, NAACL, 2018
h5 h5
(1) (2) (3)
<translation>
!Sharing the parameters of the recurrent units among all tasks
! Task interference ! Inability to leverage commonalities among subsets of tasks
!IDEA
! Multiple experts in handling different kinds of information
! Adaptively share experts among the tasks
!IDEA
! Multiple experts in handling different kinds of information
! Adaptively share experts among the tasks ! Extend the recurrent units with multiple blocks
! each block has its own information flow through the time ! Routing mechanism: to softly direct the input to these blocks
Task Block
Task Block
흉푡
h3 h2 I went h5 h4 home <EOS>
g1 g2 g3 g4
نمهبهناخمتفر
(1)(1)(1)(1)
g5
<EOS>
(1) (1) (1) (1) (1)
Context
h1
(1)
<translation>
Task Block
흉푡
Task Block
흉푡
▪
English to Farsi: TED corpus & LDC2016E93
▪
English to Vietnamese: IWSLT 2015 (TED and TEDX talks)
▪
Semantic parsing: AMR corpus(newswire, weblogs, web discussion forums and broadcast conversations)
▪
Syntactic parsing: Penn Treebank
▪
NER: CONLL NER Corpus (newswire articles from the Reuters Corpus)
▪
Optimisation: Adam
▪
Byte Pair Encoding (BPE) on both source/target
▪
Evaluation metrics: PPL, TER and BLEU
English ➔ Farsi English ➔ Vietnamese
! Average block usage. ! Blocks specialisation: Block 1: MT, Semantic Parsing, Block 2: Syntactic/Semantic
0.2 0.25 0.3 0.35 0.4 0.45 0.5
B lock 1 Block 2 Block 3
MT S emantic S yntac tic NE R
! Address the task interference issue in MTL
! extending the recurrent units with multiple blocks ! with a trainable routing network
Paper: