How recurrent networks implement contextual processing in sentiment - - PowerPoint PPT Presentation

how recurrent networks implement contextual processing in
SMART_READER_LITE
LIVE PREVIEW

How recurrent networks implement contextual processing in sentiment - - PowerPoint PPT Presentation

How recurrent networks implement contextual processing in sentiment analysis Niru Maheswaranathan and David Sussillo Google Research ICML 2020 @niru_m Sentiment classification using RNNs Sentiment classification using RNNs That restaurant


slide-1
SLIDE 1

implement contextual processing

Google Research ICML 2020 Niru Maheswaranathan and David Sussillo

in sentiment analysis How recurrent networks

@niru_m

slide-2
SLIDE 2

Sentiment classification using RNNs

slide-3
SLIDE 3

Sentiment classification using RNNs

“That restaurant is amazing! I love it!” ➞ positive “I cannot stand that place. Terrible food.” ➞ negative

slide-4
SLIDE 4

“That restaurant is amazing! I love it!” ➞ positive “I cannot stand that place. Terrible food.” ➞ negative

RNNs solve the task, but it’s hard to understand how they do it

Sentiment classification using RNNs

slide-5
SLIDE 5

“That restaurant is amazing! I love it!” ➞ positive “I cannot stand that place. Terrible food.” ➞ negative

RNNs solve the task, but it’s hard to understand how they do it

Sentiment classification using RNNs

slide-6
SLIDE 6

Understanding RNN dynamics through linearization

slide-7
SLIDE 7

n3 n2 n1

Line Attractor n2 n1 Oscillations

n2 n1

Saddle Point

Understanding RNN dynamics through linearization

slide-8
SLIDE 8

n3 n2 n1

Line Attractor n2 n1 Oscillations

n2 n1

Saddle Point

Understanding RNN dynamics through linearization

slide-9
SLIDE 9

Line attractor dynamics in trained RNNs

slide-10
SLIDE 10

Line attractor dynamics in trained RNNs

Maheswaranathan*, Williams* et al, NeurIPS 2019

slide-11
SLIDE 11

Line attractor dynamics in trained RNNs

Maheswaranathan*, Williams* et al, NeurIPS 2019

slide-12
SLIDE 12

Line attractor dynamics in trained RNNs

Maheswaranathan*, Williams* et al, NeurIPS 2019

slide-13
SLIDE 13

Line attractor dynamics in trained RNNs

Maheswaranathan*, Williams* et al, NeurIPS 2019

slide-14
SLIDE 14

Line attractor dynamics in trained RNNs

Maheswaranathan*, Williams* et al, NeurIPS 2019

Approximate line attractor dynamics explain the most of the RNN’s performance

slide-15
SLIDE 15

Line attractor dynamics in trained RNNs

Maheswaranathan*, Williams* et al, NeurIPS 2019

Approximate line attractor dynamics explain the most of the RNN’s performance

slide-16
SLIDE 16

Line attractor dynamics in trained RNNs

Approximate line attractor dynamics explain the most of the RNN’s performance Line attractor

+1

  • 1
slide-17
SLIDE 17

A remaining puzzle…

slide-18
SLIDE 18

A remaining puzzle…

Probe sentences 5 10 15 Time (t) −4 −3 −2 −1 1 2 3 4 Moe eitio (oit) Basee "This movie is aesome ie it"

slide-19
SLIDE 19

A remaining puzzle…

Probe sentences 5 10 15 Time (t) −4 −3 −2 −1 1 2 3 4 Moe eitio (oit) Baseline "This movie is aesome ie it" Neain "This movie is not aesome

  • nt ie it"
slide-20
SLIDE 20

A remaining puzzle…

Probe sentences 5 10 15 Time (t) −4 −3 −2 −1 1 2 3 4 Moe eitio (oit) Baseline "This movie is aesome ie it" Neain "This movie is not aesome

  • nt ie it"

"This movie is etreme aesome enite ie it" nensie

slide-21
SLIDE 21

Probe sentences 5 10 15 Time (t) −4 −3 −2 −1 1 2 3 4 Moe eitio (oit) Baseline "This movie is aesome ie it" Neain "This movie is not aesome

  • nt ie it"

"This movie is etreme aesome enite ie it" nensie

Contextual processing in RNNs

slide-22
SLIDE 22

Contextual processing in RNNs

Contributions of our work

slide-23
SLIDE 23

Contextual processing in RNNs

Contributions of our work

Data-driven method to identify contextual inputs Analysis of the strength and timing of modifier effects Experiments that demonstrate the identified mechanisms are necessary and sufficient for RNN performance

slide-24
SLIDE 24

Contextual processing in RNNs

Contributions of our work

Data-driven method to identify contextual inputs Analysis of the strength and timing of modifier effects Experiments that demonstrate the identified mechanisms are necessary and sufficient for RNN performance

slide-25
SLIDE 25

Contextual processing in RNNs

Data-driven method to identify contextual inputs Analysis of the strength and timing of modifier effects Experiments that demonstrate the identified mechanisms are necessary and sufficient for RNN performance

Contributions of our work

slide-26
SLIDE 26

Identifying contextual processing

Use the change in input sensitivity as a measure of contextual processing

slide-27
SLIDE 27

Identifying contextual processing

10−4 10−3 10−2 10−1 100 Change in Input Jacobian (||ΔJinp||F) 500 1000 Count

Modier token

||∆Jinp||F

<latexit sha1_base64="pZ/myz1EV1gBoP37+hKey50Iz3k=">AI5XicfVb9s2Fa7S13vlm5PxV6IBQE6YMgs392nNpehNZosa5o0QOQaFHVsE6FEgaJip4qwX7C3Ya972uv2sj+zfzPSUqyE9EZAEHW+71y+Q1L0Y0YT2Wj8c+/+Bx9+9PGD2sP6J59+9vkXG4+PE14KgicEM64OPNxAoxGcCKpZHAWC8Chz+Ctf7Gr8beXIBLKozfyKoZRiKcRnVCpTKNx5fX3t7wCRG3pBG8fX12JOwkNkP+Xhjs7HdWA5kT9xysumU42j86MHfXsBJGkIkCcNJcu42YjnKsJCUMjrXpAjMkFnsK5mkY4hGSULTXkaEtZAjThQj2REvrbY8Mh0lyFfqKGWI5S0xMG9dh56mc9EeZ0pZKiEiRaJIyJDnSDUEBFUAku1ITARVtSIywITqdpWr9e30EkCSM5AFcYn9NoinS3l7VqM42UE2bIV9YAlc1GSeqHVMpSk4BLCvOnd4peFMrW7eNEKXKDcJcJz5SsXiEGX2vwrwMYy5kgrBKsvDUL0TtANTGqEnEl9AhCaCh4gLqkyqnENIxcujY3T8HMU4BvEtMlumG53kqBhbqn4F+RyLAOkuIt1tzqxGLxfhjGiBCaqYVUowkMFypsQyxa43ze/QyDJtup3YtgTgoRmeHebplP4kC/PcAOcVOLfA1xX42gJfVeArG8Q6q6ouKIrMPGXxA2wR34Pgd5kNizOsMg0tcFGBCwucVeDMAs8q8MwCTyvw1AKBMUOdtpgsUYUQCjRQHyQ2gixNZhSuw0gaXWU7P1qLABW6b6Nh4axeiyKQicMtHGxcg6u/1dg1T9Ux1qXP3xXMPSn4a7/eLcp+tOs/xIzLPNzd5R5Qp1tOc42XSvX3nHB0FGWp1plC/Js7/gpWrLvkA8PTLPUiXk8KBk170AJuoPU4GdPHvx5kDt53ZzMA3txm9G0bT7xK/sYbRXzF2O81+cw1jsGI8b7e6HbMTYTzDCU3WFd8rl0FtJK3AlMAFjqaVCAj6g1Y3X8dZySCNTrdpCi04KyGD3XbzPzgrKb2dVtvdN6REMF1ef6aUVQ3/J0YCZt2b8C130Bt0cptRrQfp4Z6/hlGtR7+z3zVlaEa1Hq32fqdpng0eqHschCmijG3vu5ny0Lt9q48M/o71/e8a97q9uS0ue2tzs/tTef7ZQ3fs352vnGeK4Ts95rxwjpwThzg/O384fzp/1a1X2q/1n4rqPfvlT5fOXdG7fd/AbRWOiY=</latexit>

Use the change in input sensitivity as a measure of contextual processing

slide-28
SLIDE 28

Identifying contextual processing

10−4 10−3 10−2 10−1 100 Change in Input Jacobian (||ΔJinp||F) 500 1000 Count

Modier token

||∆Jinp||F

<latexit sha1_base64="pZ/myz1EV1gBoP37+hKey50Iz3k=">AI5XicfVb9s2Fa7S13vlm5PxV6IBQE6YMgs392nNpehNZosa5o0QOQaFHVsE6FEgaJip4qwX7C3Ya972uv2sj+zfzPSUqyE9EZAEHW+71y+Q1L0Y0YT2Wj8c+/+Bx9+9PGD2sP6J59+9vkXG4+PE14KgicEM64OPNxAoxGcCKpZHAWC8Chz+Ctf7Gr8beXIBLKozfyKoZRiKcRnVCpTKNx5fX3t7wCRG3pBG8fX12JOwkNkP+Xhjs7HdWA5kT9xysumU42j86MHfXsBJGkIkCcNJcu42YjnKsJCUMjrXpAjMkFnsK5mkY4hGSULTXkaEtZAjThQj2REvrbY8Mh0lyFfqKGWI5S0xMG9dh56mc9EeZ0pZKiEiRaJIyJDnSDUEBFUAku1ITARVtSIywITqdpWr9e30EkCSM5AFcYn9NoinS3l7VqM42UE2bIV9YAlc1GSeqHVMpSk4BLCvOnd4peFMrW7eNEKXKDcJcJz5SsXiEGX2vwrwMYy5kgrBKsvDUL0TtANTGqEnEl9AhCaCh4gLqkyqnENIxcujY3T8HMU4BvEtMlumG53kqBhbqn4F+RyLAOkuIt1tzqxGLxfhjGiBCaqYVUowkMFypsQyxa43ze/QyDJtup3YtgTgoRmeHebplP4kC/PcAOcVOLfA1xX42gJfVeArG8Q6q6ouKIrMPGXxA2wR34Pgd5kNizOsMg0tcFGBCwucVeDMAs8q8MwCTyvw1AKBMUOdtpgsUYUQCjRQHyQ2gixNZhSuw0gaXWU7P1qLABW6b6Nh4axeiyKQicMtHGxcg6u/1dg1T9Ux1qXP3xXMPSn4a7/eLcp+tOs/xIzLPNzd5R5Qp1tOc42XSvX3nHB0FGWp1plC/Js7/gpWrLvkA8PTLPUiXk8KBk170AJuoPU4GdPHvx5kDt53ZzMA3txm9G0bT7xK/sYbRXzF2O81+cw1jsGI8b7e6HbMTYTzDCU3WFd8rl0FtJK3AlMAFjqaVCAj6g1Y3X8dZySCNTrdpCi04KyGD3XbzPzgrKb2dVtvdN6REMF1ef6aUVQ3/J0YCZt2b8C130Bt0cptRrQfp4Z6/hlGtR7+z3zVlaEa1Hq32fqdpng0eqHschCmijG3vu5ny0Lt9q48M/o71/e8a97q9uS0ue2tzs/tTef7ZQ3fs352vnGeK4Ts95rxwjpwThzg/O384fzp/1a1X2q/1n4rqPfvlT5fOXdG7fd/AbRWOiY=</latexit>

Allows us to identify modifier inputs

slide-29
SLIDE 29

−3 −2 −1 1 2 3 Modier componen 1 −2 −1 1 2 Modier componen 2

Modifier subspace

slide-30
SLIDE 30

0.00 0.05 0.10 0.15 0.20 0.25 0.30 Change in Input Jacobian (||ΔJinp||F) −3 −2 −1 1 2 3 Modier component 1 −2 −1 1 2 Modier component 2

Modifier subspace

slide-31
SLIDE 31

0.00 0.05 0.10 0.15 0.20 0.25 0.30 Change in Input Jacobian (||ΔJinp||F) −3 −2 −1 1 2 3 Modier component 1 −2 −1 1 2 Modier component 2 not etreme

Modifier subspace

slide-32
SLIDE 32

0.00 0.05 0.10 0.15 0.20 0.25 0.30 Change in Input Jacobian (||ΔJinp||F) −3 −2 −1 1 2 3 Modier component 1 −2 −1 1 2 Modier component 2 not etreme neer e ero er but

Modifier subspace

slide-33
SLIDE 33

−4 −2 24 Principal component − −2 −

  • 2
  • ier component

Modifier dynamics

slide-34
SLIDE 34

not

−4 −2 24 Principal component − −2 −

  • 2
  • ier component

Modifier dynamics

slide-35
SLIDE 35

not

−4 −2 24 Principal component − −2 −

  • 2
  • ier component

Modifier dynamics

slide-36
SLIDE 36

extremely not

−4 −2 24 Princil comonent − −2 −

  • 2
  • ier comonent

Modifier dynamics

slide-37
SLIDE 37

−4 −2 2 4 Principal component #1 −3 −2 −1 1 2 3 Modifjer component #1

not extremely

5 10 Time (t) 1 2 3 Distance from line attractor

(b) (a)

"not" 2 toens "extremely" 1 toens 5 10 Time (t) 1 2 3 Distance from line attractor

(b) (a)

"not" 2 toens "extremely" 1 toens

extremely not

−4 −2 24 Princil comonent − −2 −

  • 2
  • ier comonent

Modifier dynamics

slide-38
SLIDE 38

Synthesizing our new understanding

Line attractor

+1

  • 1
slide-39
SLIDE 39

Synthesizing our new understanding

Line attractor

+1

  • 1

Modifier subspace

slide-40
SLIDE 40

Bag of Words

Recurrent Neural Network

Synthesizing our new understanding

slide-41
SLIDE 41

Bag of Words

Recurrent Neural Network

Augment Bag of Words to recover RNN performance

Synthesizing our new understanding

slide-42
SLIDE 42

Augmented bag-of-words model recovers RNN performance

Model Accuracy

Bag of Words (Baseline) 93.6% RNN (GRU) 95.8%

slide-43
SLIDE 43

Augmented bag-of-words model recovers RNN performance

Model Accuracy

Bag of Words (Baseline) 93.6% Augmented Bag-of-Words (includes modifier effects) 95.5% RNN (GRU) 95.8%

slide-44
SLIDE 44

Bag of Words

Recurrent Neural Network

Augment Bag of Words to recover RNN performance

Synthesizing our new understanding

slide-45
SLIDE 45

Bag of Words

Recurrent Neural Network

Perturb RNN to remove modifier effects Augment Bag of Words to recover RNN performance

Synthesizing our new understanding

slide-46
SLIDE 46

5 10 15 TiPe (t) −4 −3 −2 −1 1 2 3 4 0odel prediFtion (logit) )ull networN

Baseline IntensiIier 1egation

5 10 15 TiPe (t) −4 −3 −2 −1 1 2 3 4 0odel prediFtion (logit) 3roMeFt out oI PodiIier subspaFe 5 10 15 TiPe (t) −4 −3 −2 −1 1 2 3 4 0odel prediFtion (logit) 3roMeFt out oI randoP subspaFe

Original network

Perturbation experiment removes modifier effects

slide-47
SLIDE 47

5 10 15 TiPe (t) −4 −3 −2 −1 1 2 3 4 0odel prediFtion (logit) )ull networN

Baseline IntensiIier 1egation

5 10 15 TiPe (t) −4 −3 −2 −1 1 2 3 4 0odel prediFtion (logit) 3roMeFt out oI PodiIier subspaFe 5 10 15 TiPe (t) −4 −3 −2 −1 1 2 3 4 0odel prediFtion (logit) 3roMeFt out oI randoP subspaFe

Original network Perturbed network

Perturbation experiment removes modifier effects

slide-48
SLIDE 48

5 10 15 TiPe (t) −4 −3 −2 −1 1 2 3 4 0odel prediFtion (logit) )ull networN

Baseline IntensiIier 1egation

5 10 15 TiPe (t) −4 −3 −2 −1 1 2 3 4 0odel prediFtion (logit) 3roMeFt out oI PodiIier subspaFe 5 10 15 TiPe (t) −4 −3 −2 −1 1 2 3 4 0odel prediFtion (logit) 3roMeFt out oI randoP subspaFe

Original network Perturbed network

Perturbation experiment removes modifier effects

slide-49
SLIDE 49

Thank you!

arxiv.org/abs/2004.08013

Paper Niru Maheswaranathan

@niru_m nirum@google.com

Line attractor

+1

  • 1

Modifier subspace