Adaptive Knowledge Sharing in Multi-Task Learning: Improving - - PowerPoint PPT Presentation

adaptive knowledge sharing in multi task learning
SMART_READER_LITE
LIVE PREVIEW

Adaptive Knowledge Sharing in Multi-Task Learning: Improving - - PowerPoint PPT Presentation

Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation Poorya ZareMoodi , Wray Buntine, Gholamreza (Reza) Haffari Monash University Slides : Roadmap ! 2 Introduction & background Adaptive


slide-1
SLIDE 1

Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation

Poorya ZareMoodi, Wray Buntine, Gholamreza (Reza) Haffari Monash University

Slides:

slide-2
SLIDE 2

Roadmap

  • Introduction & background
  • Adaptive knowledge sharing in Multi-Task Learning
  • Experiments & analysis
  • Conclusion

!2

slide-3
SLIDE 3

Improving NMT in low-Resource scenarios

  • NMT is notorious!
  • Bilingually low-resource scenario: large amounts of bilingual training data is

not available

  • IDEA: Use existing resources from other tasks and train one model for all tasks

using multi-task learning

  • This effectively injects inductive biases to help improving the generalisation of

NMT

  • Auxiliary tasks: Semantic Parsing, Syntactic Parsing, Named Entity Recognition

!3

slide-4
SLIDE 4

Encoders-Decoders for Individual Tasks

!4

Encoder Decoder Machine Translation Encoder Decoder Semantic Parsing Encoder Decoder Syntactic Parsing Encoder Decoder Named-Entity Recognition

I went homeمتفر هناخ هب نم

S NP DT the N burglar VP V robbed NP DT the N apartment

Obama was elected and his voter celebrated The burglar robbed the apartment Jim bought 300 shares of Acme Corp. in 2006

B-PER 0 0 0 0 B-ORG I-ORG 0 B-MISC

slide-5
SLIDE 5

Sharing Scenario


!5

Multitask seq2seq translation Named-entities Parse tree Semantic graph sentence task tag Encoder Decoder

Machine Translation

Encoder Decoder Encoder Decoder

Syntactic Parsing

Encoder Decoder

Named-Entity Recognition Semantic Parsing

slide-6
SLIDE 6

h5

Partial Parameter Sharing

!6

h2 h1

I went

h4 h3

home <EOS>

h2 h1 h4 h3 h2 h1 h4 h3

g1 g2 g3 g1 g2 g3 g1 g2 g3 g4 g4 g4

نمهبهناخمتفر

(1)(1)(1)(1) (2)(2)(2)(2) (3)(3)(3)(3)

shared

g5 g5 g5

<EOS>

(1) (1) (1) (1) (1) (2) (2) (2) (2) (2) (3) (3) (3) (3) (3)

Context

shared Encoder Decoder

<translation> I went homeمتفر هناخ هب نم

Zaremoodi & Haffari, NAACL, 2018

h5 h5

(1) (2) (3)

<translation>

T a s k i n t e r f e r e n c e

slide-7
SLIDE 7

Roadmap

  • Introduction & Background
  • Adaptive knowledge sharing in Multi-Task Learning
  • Experiments & analysis
  • Conclusion

!7

slide-8
SLIDE 8

Adaptive Knowledge Sharing in MTL

!Sharing the parameters of the recurrent units among all tasks

! Task interference ! Inability to leverage commonalities among subsets of tasks

!IDEA

! Multiple experts in handling different kinds of information

! Adaptively share experts among the tasks

!8

sharing the knowledge for controlling the information flow in the hidden states

slide-9
SLIDE 9

Adaptive Knowledge Sharing in MTL

!IDEA

! Multiple experts in handling different kinds of information

! Adaptively share experts among the tasks ! Extend the recurrent units with multiple blocks

! each block has its own information flow through the time ! Routing mechanism: to softly direct the input to these blocks

!9

Task Block

slide-10
SLIDE 10

Adaptive Knowledge Sharing

!10

Routing: Blocks:

Task Block

흉푡

slide-11
SLIDE 11

Adaptive Knowledge Sharing

!11

h3 h2 I went h5 h4 home <EOS>

g1 g2 g3 g4

نمهبهناخمتفر

(1)(1)(1)(1)

g5

<EOS>

(1) (1) (1) (1) (1)

Context

h1

(1)

<translation>

We use the proposed recurrent unit inside encoder and decoder.

Task Block

흉푡

Task Block

흉푡

slide-12
SLIDE 12

Roadmap

  • Introduction & background
  • Adaptive knowledge sharing in Multi-Task Learning
  • Experiments & analysis
  • Conclusion

!12

slide-13
SLIDE 13

Experiments

  • Language Pairs: English to Farsi/Vietnamese
  • Datasets:

English to Farsi: TED corpus & LDC2016E93

English to Vietnamese: IWSLT 2015 (TED and TEDX talks)

Semantic parsing: AMR corpus(newswire, weblogs, web discussion forums and broadcast conversations)

Syntactic parsing: Penn Treebank

NER: CONLL NER Corpus (newswire articles from the Reuters Corpus)

  • NMT Architecture: GRU for blocks, 400 RNN hidden states and word embedding
  • NMT best practice:

Optimisation: Adam

Byte Pair Encoding (BPE) on both source/target

Evaluation metrics: PPL, TER and BLEU

!13

slide-14
SLIDE 14

Experiments

!14

English ➔ Farsi English ➔ Vietnamese

BLEU

slide-15
SLIDE 15

Experiments (English to Farsi)

! Average block usage. ! Blocks specialisation: Block 1: MT, Semantic Parsing, Block 2: Syntactic/Semantic

Parsing, Block 3: NER

!15

0.2 0.25 0.3 0.35 0.4 0.45 0.5

B lock 1 Block 2 Block 3

MT S emantic S yntac tic NE R

slide-16
SLIDE 16

Conclusion

! Address the task interference issue in MTL

! extending the recurrent units with multiple blocks ! with a trainable routing network

!16

slide-17
SLIDE 17

!17

Questions?

Paper: