Attention and its (mis)interpretation
Danish Pruthi
1
Attention and its (mis)interpretation Danish Pruthi 1 - - PowerPoint PPT Presentation
Attention and its (mis)interpretation Danish Pruthi 1 Acknowledgements Mansi Gupta Bhuwan Dhingra Graham Neubig Zachary C. Lipton 2 Outline 1. What is attention mechanism? 2. Attention-as-explanations 3. Manipulating attention weights 4.
Danish Pruthi
1
2
Mansi Gupta Bhuwan Dhingra Zachary C. Lipton Graham Neubig
3
4
5
यह है एक उदाहरण </s>
5
यह है एक उदाहरण </s>
LSTM LSTM LSTM LSTM LSTM
Encoder
5
यह है एक उदाहरण </s>
LSTM LSTM LSTM LSTM LSTM
Encoder
This example is an
LSTM LSTM LSTM LSTM LSTM
This example is an </s> <s>
Decoder
5
यह है एक उदाहरण </s>
LSTM LSTM LSTM LSTM LSTM
Encoder
This example is an
LSTM LSTM LSTM LSTM LSTM
This example is an </s> <s>
Decoder
5
Problem: “You can’t cram the meaning of a whole %&!$ing sentence into a single $&!*ing vector!” — Ray Mooney Solution: Use attention (Bahdanau et al. 2015)
6
Bahdanau et al. 2015
vectors, weighted by “attention weights”
7
8
यह है एक उदाहरण
8
यह है एक उदाहरण This is an <s>
8
यह है एक उदाहरण This is an <s>
Key vectors
8
यह है एक उदाहरण This is an <s>
Key vectors Query vector
8
यह है एक उदाहरण This is an <s>
Key vectors Query vector
compute attention scores
0.3
2.1
8
यह है एक उदाहरण This is an <s>
Key vectors Query vector
compute attention scores Softmax
0.3
2.1 0.08 0.13 0.03 0.76
8
यह है एक उदाहरण This is an <s>
Key vectors Query vector
8
यह है एक उदाहरण This is an <s>
Key vectors Query vector 0.08 0.13 0.03 0.76 * * * * Context vector
8
यह है एक उदाहरण This is an <s>
Key vectors Query vector 0.08 0.13 0.03 0.76 * * * *
example
Context vector
8
attention
attention
attention
attention score(st, hi) = v⊤
atanh(Wa[st; hi])
score(st, hi) = s⊤
t hi
score(st, hi) = s⊤
t hi
n score(st, hi) = s⊤
t Wahi
9
attention
attention
attention
attention score(st, hi) = v⊤
atanh(Wa[st; hi])
score(st, hi) = s⊤
t hi
score(st, hi) = s⊤
t hi
n score(st, hi) = s⊤
t Wahi
9
10
Image captioning
Xu et al, 2015
Entailment
Rocktäschel et al, 2015
11
Image captioning
Xu et al, 2015
BERTViz
Vig et al, 2019
Document classification
Yang et al, 2016
and many others…
11
12
"By inspecting the network’s attention, for instance by visually highlighting attention weights, one could attempt to investigate and understand the outcome of neural networks. Hence, weight visualization is now common practice."
Galassi et al., 2019
accountability, etc
Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting,
De-Arteaga, et al, 2019
13
* Fairness, accountability and transparency
De-Arteaga et al., 2019
stakes job recommendation models
14
* Fairness, accountability and transparency
De-Arteaga et al., 2019
stakes job recommendation models "The attention weights indicate which tokens are the most predictive"
14
* Fairness, accountability and transparency
De-Arteaga et al., 2019
stakes job recommendation models "The attention weights indicate which tokens are the most predictive" We question this assumption: does attention necessarily indicate most predictive tokens?
14
* Fairness, accountability and transparency
15
be useful for prediction
16
17
17
Task Input Example
17
Task Input Example
Occupation Prediction (Physician vs Surgeon)
speaks English and Spanish.
17
Task Input Example
Occupation Prediction (Physician vs Surgeon)
speaks English and Spanish. Gender Identification After that, Austen was educated at home until she went to boarding school early in 1785
17
Task Input Example
Occupation Prediction (Physician vs Surgeon)
speaks English and Spanish. Gender Identification After that, Austen was educated at home until she went to boarding school early in 1785 Sentiment Analysis (SST + Wikipedia) Good acting, good dialogue, good cinematography. Helen Reddy is an Australian singer and activist.
17
Task Input Example
Occupation Prediction (Physician vs Surgeon)
speaks English and Spanish. Gender Identification After that, Austen was educated at home until she went to boarding school early in 1785 Sentiment Analysis (SST + Wikipedia) Good acting, good dialogue, good cinematography. Helen Reddy is an Australian singer and activist. Acceptance Prediction (Reference Letters) It is with pleasure that I am writing this letter...I highly recommend her for your institution. Percentile:99.0 Rank:Extraordinary.
18
25 50 75 100 Task Occupation Prediction Gender Identification SST + Wiki Reference Letters
With Without
18
25 50 75 100 Task Occupation Prediction Gender Identification SST + Wiki Reference Letters
With Without
93.8 96.4
18
25 50 75 100 Task Occupation Prediction Gender Identification SST + Wiki Reference Letters
With Without
72.8 93.8 100 96.4
18
25 50 75 100 Task Occupation Prediction Gender Identification SST + Wiki Reference Letters
With Without
50.4 72.8 93.8 90.8 100 96.4
18
25 50 75 100 Task Occupation Prediction Gender Identification SST + Wiki Reference Letters
With Without
74.7 50.4 72.8 93.8 77.5 90.8 100 96.4
18
19
19
Task Example
19
Task Example Bigram Flipping {w1, w2 … w2n-1, w2n} → {w2, w1, … w2n, w2n-1}
19
Task Example Bigram Flipping {w1, w2 … w2n-1, w2n} → {w2, w1, … w2n, w2n-1} Sequence Copying {w1,w2, … wn-1, wn} → {w1,w2, … wn, wn-1}
19
Task Example Bigram Flipping {w1, w2 … w2n-1, w2n} → {w2, w1, … w2n, w2n-1} Sequence Copying {w1,w2, … wn-1, wn} → {w1,w2, … wn, wn-1} Sequence Reversal {w1,w2, … wn-1, wn} → {wn,wn-1, … w2, w1}
19
Task Example Bigram Flipping {w1, w2 … w2n-1, w2n} → {w2, w1, … w2n, w2n-1} Sequence Copying {w1,w2, … wn-1, wn} → {w1,w2, … wn, wn-1} Sequence Reversal {w1,w2, … wn-1, wn} → {wn,wn-1, … w2, w1} English - German MT This is an example. → Dieser ist ein Beispiel.
20
𝖩
20
𝖩
20
𝖩
impermissible tokens
20
21
Total attention mass
21
Total attention mass
Penalty coefficient that modulates attention on impermissible tokens
21
Total attention mass
Penalty coefficient that modulates attention on impermissible tokens
21
(2019) propose a different penalty term
22
22
22
22
23
x1
biLSTM biLSTM biLSTM
x2
…..
biLSTM
xn x3
αn α1 α2 α3
y
24
x1 x2 xn x3
αn α1 α2 α3
y
25
Devlin et. al
26
Devlin et. al
26
Good Movie [SEP] [CLS] Good Movie [SEP] [CLS]
L12 L.. L1
Predictions
L0
Original
27
Good Movie [SEP] [CLS] Good Movie [SEP] [CLS]
L12 L.. L1
Predictions
L0 L0 L12 L.. L1
Good Movie [SEP]
Capital
Delhi [SEP] [CLS]
Predictions
Impermissible Permissible
Original Restricted
27
28
25 50 75 100 Attention type Original Manipulated (λ = 0.1) Manipulated (λ = 1.0)
Accuracy Attention Mass
28
25 50 75 100 Attention type Original Manipulated (λ = 0.1) Manipulated (λ = 1.0)
Accuracy Attention Mass
99.7 97.2
28
25 50 75 100 Attention type Original Manipulated (λ = 0.1) Manipulated (λ = 1.0)
Accuracy Attention Mass
99.7 97.1 97.2
28
25 50 75 100 Attention type Original Manipulated (λ = 0.1) Manipulated (λ = 1.0)
Accuracy Attention Mass
99.7 97.4 97.1 97.2
28
29
29
29
29
29
29
29
30
At inference time, what if we hard set the corresponding attention mass to ZERO?
30
At inference time, what if we hard set the corresponding attention mass to ZERO?
50 % 100%
30
31
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
31
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
94.5 100
31
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
94.5 96.5 100
31
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
5.2 94.5 97.9 96.5 100
31
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
0.4 5.2 94.5 99.9 97.9 96.5 100
31
Original
32
Original Manipulated
32
Original Manipulated A different seed
32
33
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
33
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
98.8 100
33
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
98.8 84.1 100
33
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
5.2 98.8 93.8 84.1 100
33
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
0.01 5.2 98.8 99.9 93.8 84.1 100
33
Original
34
Original Manipulated
34
Original Manipulated A different seed
34
35
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
35
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
94.1 100
35
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
94.1 84.1 100
35
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
4.7 94.1 88.1 84.1 100
35
25 50 75 100 Attention type Original None Uniform Manipulated
Accuracy Attention Mass
0.02 4.7 94.1 99.8 88.1 84.1 100
35
Original
36
Original Manipulated
36
Original Manipulated A different seed
36
37
7.5 15 22.5 30 Attention type Original None Uniform Manipulated (λ = 1.0) Manipulated (λ = 0.1)
BLEU Attention Mass
37
7.5 15 22.5 30 Attention type Original None Uniform Manipulated (λ = 1.0) Manipulated (λ = 0.1)
BLEU Attention Mass
20.7 24.4
37
7.5 15 22.5 30 Attention type Original None Uniform Manipulated (λ = 1.0) Manipulated (λ = 0.1)
BLEU Attention Mass
20.7 14.9 24.4
37
7.5 15 22.5 30 Attention type Original None Uniform Manipulated (λ = 1.0) Manipulated (λ = 0.1)
BLEU Attention Mass
5.9 20.7 18.5 14.9 24.4
37
7.5 15 22.5 30 Attention type Original None Uniform Manipulated (λ = 1.0) Manipulated (λ = 0.1)
BLEU Attention Mass
1.1 5.9 20.7 20.6 18.5 14.9 24.4
37
7.5 15 22.5 30 Attention type Original None Uniform Manipulated (λ = 1.0) Manipulated (λ = 0.1)
BLEU Attention Mass
7 1.1 5.9 20.7 23.7 20.6 18.5 14.9 24.4
37
corresponding to impermissible tokens.
38
the predicted outputs (Physician or Surgeon) by one of the models
highlighted on the basis of an “explanation method” (attention weights)
39
the gender of the individual?
40
the gender of the individual?
40
Manipulation type
Input example
Predicted label - Physician
Percentage of sentences (yes)
the gender of the individual?
40
Manipulation type
Input example
Predicted label - Physician
Percentage of sentences (yes) No manipulation
in urological surgery. ms. UNK is affiliated with menorah medical center
66%
the gender of the individual?
40
Manipulation type
Input example
Predicted label - Physician
Percentage of sentences (yes) No manipulation
in urological surgery. ms. UNK is affiliated with menorah medical center
66% Ours
in urological surgery. ms. UNK is affiliated with menorah medical center
0%
the gender of the individual?
40
Manipulation type
Input example
Predicted label - Physician
Percentage of sentences (yes) No manipulation
in urological surgery. ms. UNK is affiliated with menorah medical center
66% Ours
in urological surgery. ms. UNK is affiliated with menorah medical center
0% Weigraff et al, 2019
in urological surgery. ms. UNK is affiliated with menorah medical center
0%
model’s prediction?
41
Manipulation type
Input example
Predicted label - Physician
Rating (1 to 4) No manipulation
in urological surgery. ms. UNK is affiliated with menorah medical center
3.0 / 4 Ours
in urological surgery. ms. UNK is affiliated with menorah medical center
2.67 / 4 Weigraff et al, 2019
in urological surgery. ms. UNK is affiliated with menorah medical center
1.0 / 4
42
43
43
in accuracy.
43
in accuracy.
compared against models with no or uniform attention.
43
in accuracy.
compared against models with no or uniform attention.
are not consistent with one another.
43
44
compute the reliability of attention for an explanation, for a general model"
attention is conceived, it does not make any effort to offer a solution."
45
that if [accuracy] scores were retained even after changing the attention weights, then what exactly is the model focussing on for its predictions"
46