Parameter Sharing Methods for Multilingual Self-Attentional Translation Models
Devendra Sachan1 Graham Neubig2
1Data Solutions Team,
Petuum Inc, USA
2Language Technologies Institute,
Parameter Sharing Methods for Multilingual Self-Attentional - - PowerPoint PPT Presentation
Parameter Sharing Methods for Multilingual Self-Attentional Translation Models Devendra Sachan 1 Graham Neubig 2 1 Data Solutions Team, Petuum Inc, USA 2 Language Technologies Institute, Carnegie Mellon University, USA Conference on Machine
1Data Solutions Team,
2Language Technologies Institute,
Multilingual Machine Translation System
English German Dutch Japanese English German Dutch Japanese
Multilingual Machine Translation System
English German Dutch Japanese English German Dutch Japanese
Multilingual Machine Translation System
English German Dutch Japanese English German Dutch Japanese
Multilingual Machine Translation System
English German Dutch Japanese English German Dutch Japanese
Multilingual Machine Translation System
English German Dutch
Multilingual Machine Translation System
English German Dutch
Shared Encoder Decoder 2 Source Language: "En" Target Language 2: "Nl" Decoder 1 Target Language 1: "De"
1Multi-Task Learning for Multiple Language Translation, ACL 2015
Shared Encoder Decoder 2 Source Language: "En" Target Language 2: "Nl" Decoder 1 Target Language 1: "De"
1Multi-Task Learning for Multiple Language Translation, ACL 2015
Shared Encoder Decoder 2 Source Language: "En" Target Language 2: "Nl" Decoder 1 Target Language 1: "De"
1Multi-Task Learning for Multiple Language Translation, ACL 2015
Shared Encoder Decoder 2 Source Language: "En" Target Language 2: "Nl" Decoder 1 Target Language 1: "De"
1Multi-Task Learning for Multiple Language Translation, ACL 2015
Shared Encoder Source Language: "En" Target Language 2: "Nl" Shared Decoder Target Language 1: "De"
2Google’s Multilingual Neural Machine Translation System: Enabling
Shared Encoder Source Language: "En" Target Language 2: "Nl" Shared Decoder Target Language 1: "De"
2Google’s Multilingual Neural Machine Translation System: Enabling
Shared Encoder Source Language: "En" Target Language 2: "Nl" Shared Decoder Target Language 1: "De"
2Google’s Multilingual Neural Machine Translation System: Enabling
Shared Encoder Source Language: "En" Target Language 2: "Nl" Shared Decoder Target Language 1: "De"
2Google’s Multilingual Neural Machine Translation System: Enabling
Shared Encoder Decoder 2 Source Language: "En" Target Language 2: "Nl" Target Language 1: "De" Decoder 1 Shareable Parameters
Shared Encoder Decoder 2 Source Language: "En" Target Language 2: "Nl" Target Language 1: "De" Decoder 1 Shareable Parameters
Shared Encoder Decoder 2 Source Language: "En" Target Language 2: "Nl" Target Language 1: "De" Decoder 1 Shareable Parameters
3Attention is all you need, NIPS 2017
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
3Attention is all you need, NIPS 2017
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
3Attention is all you need, NIPS 2017
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
3Attention is all you need, NIPS 2017
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
3Attention is all you need, NIPS 2017
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
3Attention is all you need, NIPS 2017
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
3Attention is all you need, NIPS 2017
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
3Attention is all you need, NIPS 2017
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
3Attention is all you need, NIPS 2017
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
3Attention is all you need, NIPS 2017
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
Enc-Dec Inter Attention
ki vi qi ai
Layer Norm Layer Norm ReLU Masked Self-Attention Layer Norm
xi W 1
F
W 1
K
W 1
V
ki W 1
Q
vi qi ai W 2
K
W 2
V
W 2
Q
W 2
F
WL1 WL2
Self-Attention Sublayer Encoder-Decoder Attention Sublayer Feed-Forward Network Sublayer
zi hi
Encoder Hidden State
zi
Layer Norm
WE
Embedding Layer Position Encoding
W ⊺
E Tied Linear Layer
N×
Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Decoder 1 Shareable Parameters Decoder 2 Embedding Layer Self-Attention Feed-Forward Network Encoder Source Language: ''En" Target Language 1: "De" Target Language 2: "Nl" Tied Linear Layer Tied Linear Layer
Encoder 1 Decoder 1 Source Language: "En" Target Language: "De" Encoder 2 Decoder 2 Source Language: "En" Target Language: "Nl"
Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Decoder 1 Shareable Parameters Decoder 2 Embedding Layer Self-Attention Feed-Forward Network Encoder Source Language: ''En" Target Language 1: "De" Target Language 2: "Nl" Tied Linear Layer Tied Linear Layer
Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Decoder 1 Shareable Parameters Decoder 2 Embedding Layer Self-Attention Feed-Forward Network Encoder Source Language: ''En" Target Language 1: "De" Target Language 2: "Nl" Tied Linear Layer Tied Linear Layer
Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Decoder 1 Shareable Parameters Decoder 2 Embedding Layer Self-Attention Feed-Forward Network Encoder Source Language: ''En" Target Language 1: "De" Target Language 2: "Nl" Tied Linear Layer Tied Linear Layer
Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Decoder 1 Shareable Parameters Decoder 2 Embedding Layer Self-Attention Feed-Forward Network Encoder Source Language: ''En" Target Language 1: "De" Target Language 2: "Nl" Tied Linear Layer Tied Linear Layer
Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Decoder 1 Shareable Parameters Decoder 2 Embedding Layer Self-Attention Feed-Forward Network Encoder Source Language: ''En" Target Language 1: "De" Target Language 2: "Nl" Tied Linear Layer Tied Linear Layer
Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Decoder 1 Shareable Parameters Decoder 2 Embedding Layer Self-Attention Feed-Forward Network Encoder Source Language: ''En" Target Language 1: "De" Target Language 2: "Nl" Tied Linear Layer Tied Linear Layer
Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Embedding Layer Masked Self-Attention Enc-Dec Attention Feed-Forward Network Decoder 1 Shareable Parameters Decoder 2 Embedding Layer Self-Attention Feed-Forward Network Encoder Source Language: ''En" Target Language 1: "De" Target Language 2: "Nl" Tied Linear Layer Tied Linear Layer
4When and Why are Pre-trained Word Embeddings Useful for Neural
4When and Why are Pre-trained Word Embeddings Useful for Neural
4When and Why are Pre-trained Word Embeddings Useful for Neural
4When and Why are Pre-trained Word Embeddings Useful for Neural
4When and Why are Pre-trained Word Embeddings Useful for Neural
G N M T N S G N M T F S T F N S T F F S 20 22 24 26 28 30 32 34 BLEU
En Ro
G N M T N S G N M T F S T F N S T F F S 36 38 40 42 44 46 48 50 BLEU
En Fr En Ro + Fr
G N M T N S G N M T F S T F N S T F F S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S 24 26 28 30 32 34 36 38 BLEU
En Nl En De + Nl
G N M T N S G N M T F S T F N S T F F S 20 22 24 26 28 30 32 34 BLEU
En Ro
G N M T N S G N M T F S T F N S T F F S 36 38 40 42 44 46 48 50 BLEU
En Fr En Ro + Fr
G N M T N S G N M T F S T F N S T F F S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S 24 26 28 30 32 34 36 38 BLEU
En Nl En De + Nl
G N M T N S G N M T F S T F N S T F F S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S 14 15 16 17 18 19 20 21 BLEU
En Tr En De + Tr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Ja En De + Ja
G N M T N S G N M T F S T F N S T F F S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S 14 15 16 17 18 19 20 21 BLEU
En Tr En De + Tr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Ja En De + Ja
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En Ro
G N M T N S G N M T F S T F N S T F F S T F P S 36 38 40 42 44 46 48 50 BLEU
En Fr En Ro + Fr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 24 26 28 30 32 34 36 38 BLEU
En Nl En De + Nl
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Tr En De + Tr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Ja En De + Ja
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En Ro
G N M T N S G N M T F S T F N S T F F S T F P S 36 38 40 42 44 46 48 50 BLEU
En Fr En Ro + Fr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 24 26 28 30 32 34 36 38 BLEU
En Nl En De + Nl
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Tr En De + Tr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Ja En De + Ja
N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En Ro
G N M T N S G N M T F S T F N S T F F S T F P S 36 38 40 42 44 46 48 50 BLEU
En Fr En Ro + Fr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 24 26 28 30 32 34 36 38 BLEU
En Nl En De + Nl
N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Tr En De + Tr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Ja En De + Ja
N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En Ro
G N M T N S G N M T F S T F N S T F F S T F P S 36 38 40 42 44 46 48 50 BLEU
En Fr En Ro + Fr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 24 26 28 30 32 34 36 38 BLEU
En Nl En De + Nl
N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Tr En De + Tr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Ja En De + Ja
N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En Ro
G N M T N S G N M T F S T F N S T F F S T F P S 36 38 40 42 44 46 48 50 BLEU
En Fr En Ro + Fr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 24 26 28 30 32 34 36 38 BLEU
En Nl En De + Nl
N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Tr En De + Tr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Ja En De + Ja
N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En Ro
G N M T N S G N M T F S T F N S T F F S T F P S 36 38 40 42 44 46 48 50 BLEU
En Fr En Ro + Fr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 24 26 28 30 32 34 36 38 BLEU
En Nl En De + Nl
N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Tr En De + Tr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Ja En De + Ja
N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En Ro
G N M T N S G N M T F S T F N S T F F S T F P S 36 38 40 42 44 46 48 50 BLEU
En Fr En Ro + Fr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 24 26 28 30 32 34 36 38 BLEU
En Nl En De + Nl
N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Tr En De + Tr
G N M T N S G N M T F S T F N S T F F S T F P S 20 22 24 26 28 30 32 34 BLEU
En De
G N M T N S G N M T F S T F N S T F F S T F P S 14 15 16 17 18 19 20 21 BLEU
En Ja En De + Ja