Saliency-driven Word Alignment Interpretation for NMT
Shuoyang Ding Hainan Xu Philipp Koehn The Fourth Conference on Machine Translation Florence, Italy August 1st, 2019
Saliency-driven Word Alignment Interpretation for NMT Shuoyang Ding - - PowerPoint PPT Presentation
Saliency-driven Word Alignment Interpretation for NMT Shuoyang Ding Hainan Xu Philipp Koehn The Fourth Conference on Machine Translation Florence, Italy August 1st, 2019 Revisiting Six Challenges poor out-of-domain performance
Shuoyang Ding Hainan Xu Philipp Koehn The Fourth Conference on Machine Translation Florence, Italy August 1st, 2019
Saliency-driven Word Alignment Interpretation for NMT
2
[Koehn and Knowles 2017]
Saliency-driven Word Alignment Interpretation for NMT
3
[Koehn and Knowles 2017]
Saliency-driven Word Alignment Interpretation for NMT
4
Saliency-driven Word Alignment Interpretation for NMT
5
Saliency-driven Word Alignment Interpretation for NMT
[Jain and Wallace NAACL 2019]
[Serrano and Smith ACL 2019]
6
Saliency-driven Word Alignment Interpretation for NMT
8
Saliency-driven Word Alignment Interpretation for NMT
9
Saliency-driven Word Alignment Interpretation for NMT
10
Saliency-driven Word Alignment Interpretation for NMT
11
Saliency-driven Word Alignment Interpretation for NMT
12
Saliency-driven Word Alignment Interpretation for NMT
13
Saliency-driven Word Alignment Interpretation for NMT
14
Saliency-driven Word Alignment Interpretation for NMT
15
Saliency-driven Word Alignment Interpretation for NMT
16
Saliency-driven Word Alignment Interpretation for NMT
17
Saliency-driven Word Alignment Interpretation for NMT
18
when :
Saliency-driven Word Alignment Interpretation for NMT
19
Saliency-driven Word Alignment Interpretation for NMT
20
Saliency-driven Word Alignment Interpretation for NMT
[Simonyan et al. 2013][Springenberg et al. 2014] [Smilkov et al. 2017]
[Aubakirova and Bansal 2016][Li et al. 2016][Ding et al. 2017] [Arras et al. 2016;2017][Mudrakarta et al. 2018]
21
Saliency-driven Word Alignment Interpretation for NMT
points where the gradients are noisy.
the same input corrupted with gaussian noise, and average the saliency of copies.
22
[Smilkov et al. 2017]
Saliency-driven Word Alignment Interpretation for NMT
24
Photo Credit: Hainan Xu
Saliency-driven Word Alignment Interpretation for NMT
25
It’s straight-forward to compute saliency for a single dimension of the word embedding.
Saliency-driven Word Alignment Interpretation for NMT
26
But how to compose the saliency of each dimension into the saliency of a word?
Saliency-driven Word Alignment Interpretation for NMT
27
Consider word embedding look-up as a dot product between the embedding matrix and an one-hot vector.
Saliency-driven Word Alignment Interpretation for NMT
28
The 1 in the one-hot vector denotes the identity of the input word.
Saliency-driven Word Alignment Interpretation for NMT
29
Let’s perturb that 1 like a real value! i.e. take gradients with regard to the 1.
Saliency-driven Word Alignment Interpretation for NMT
30
∑
i
ei ⋅ ∂y ∂ei (−∞, ∞) range:
Saliency-driven Word Alignment Interpretation for NMT
32
Saliency-driven Word Alignment Interpretation for NMT
Transformer (with fairseq default hyper- parameters)
covers de-en, fr-en and ro-en.
33
Saliency-driven Word Alignment Interpretation for NMT
input samples, then average the attention weights over samples
embedding gradients, then average over embedding dimensions
34
Saliency-driven Word Alignment Interpretation for NMT
AER 15 20 25 30 35 40 45
Attention Smoothed Attention Li+Grad Li+SmoothGrad Ours+Grad Ours+SmoothGrad fast-align Zenkel et al. [2019] GIZA++
35
Saliency-driven Word Alignment Interpretation for NMT
36
AER 15 25 35 45 55 65
Conv LSTM Transformer fast-align Zenkel et al. [2019] GIZA++
Saliency-driven Word Alignment Interpretation for NMT
37
AER 15 25 35 45 55 65
Conv LSTM Transformer fast-align Zenkel et al. [2019] GIZA++
Saliency-driven Word Alignment Interpretation for NMT
38
Saliency-driven Word Alignment Interpretation for NMT
39
Saliency-driven Word Alignment Interpretation for NMT
better interpretation method than attention
just need to properly uncover them!
41
Paper Code Slides
https://github.com/shuoyangd/meerkat