How recurrent networks implement contextual processing in sentiment - PowerPoint PPT Presentation

How recurrent networks implement contextual processing in sentiment analysis Niru Maheswaranathan and David Sussillo Google Research ICML 2020 @niru_m

Sentiment classification using RNNs

Sentiment classification using RNNs “That restaurant is amazing! I love it!” ➞ positive “I cannot stand that place. Terrible food.” ➞ negative

Sentiment classification using RNNs “That restaurant is amazing! I love it!” ➞ positive “I cannot stand that place. Terrible food.” ➞ negative RNNs solve the task, but it’s hard to understand how they do it

Understanding RNN dynamics through linearization

Understanding RNN dynamics through linearization Saddle Point Oscillations Line Attractor n 2 n 3 n 2 n 1 n 1 n 2 n 1

Line attractor dynamics in trained RNNs

Line attractor dynamics in trained RNNs Maheswaranathan*, Williams* et al, NeurIPS 2019

Line attractor dynamics in trained RNNs Approximate line attractor dynamics explain the most of the RNN’s performance Maheswaranathan*, Williams* et al, NeurIPS 2019

Line attractor dynamics in trained RNNs Approximate line attractor dynamics explain the most of the RNN’s performance -1 Line attractor +1

A remaining puzzle…

A remaining puzzle… 4 Probe sentences Mo�e� ��e�i�tio� (�o�it) 3 Base��e "This movie is a�esome� 2 � �i�e it�" 1 0 −1 −2 −3 −4 0 5 10 15 Time (t)

A remaining puzzle… 4 Probe sentences Mo�e� ��e�i�tio� (�o�it) 3 Baseline "This movie is a�esome� 2 � �i�e it�" 1 Ne�a�i�n 0 "This movie is not a�esome� −1 � �on�t �i�e it�" −2 −3 −4 0 5 10 15 Time (t)

A remaining puzzle… 4 Probe sentences Mo�e� ��e�i�tio� (�o�it) 3 Baseline "This movie is a�esome� 2 � �i�e it�" 1 Ne�a�i�n 0 "This movie is not a�esome� −1 � �on�t �i�e it�" −2 �n�ensi�e� −3 "This movie is e�treme�� a�esome� � �e�nite�� i�e it�" −4 0 5 10 15 Time (t)

Contextual processing in RNNs 4 Probe sentences Mo�e� ��e�i�tio� (�o�it) 3 Baseline "This movie is a�esome� 2 � �i�e it�" 1 Ne�a�i�n 0 "This movie is not a�esome� −1 � �on�t �i�e it�" −2 �n�ensi�e� −3 "This movie is e�treme�� a�esome� � �e�nite�� i�e it�" −4 0 5 10 15 Time (t)

Contextual processing in RNNs Contributions of our work

Contextual processing in RNNs Contributions of our work Data-driven method to identify contextual inputs Analysis of the strength and timing of modifier e ff ects Experiments that demonstrate the identified mechanisms are necessary and su ff icient for RNN performance

Identifying contextual processing Use the change in input sensitivity as a measure of contextual processing

<latexit sha1_base64="pZ/myz1EV1gBoP37+hKey50Iz3k=">AI5XicfVb9s2Fa7S13vlm5PxV6IBQE6YMgs392nNpehNZosa5o0QOQaFHVsE6FEgaJip4qwX7C3Ya972uv2sj+zfzPSUqyE9EZAEHW+71y+Q1L0Y0YT2Wj8c+/+Bx9+9PGD2sP6J59+9vkXG4+PE14KgicEM64OPNxAoxGcCKpZHAWC8Chz+Ctf7Gr8beXIBLKozfyKoZRiKcRnVCpTKNx5fX3t7wCRG3pBG8fX12JOwkNkP+Xhjs7HdWA5kT9xysumU42j86MHfXsBJGkIkCcNJcu42YjnKsJCUMjrXpAjMkFnsK5mkY4hGSULTXkaEtZAjThQj2REvrbY8Mh0lyFfqKGWI5S0xMG9dh56mc9EeZ0pZKiEiRaJIyJDnSDUEBFUAku1ITARVtSIywITqdpWr9e30EkCSM5AFcYn9NoinS3l7VqM42UE2bIV9YAlc1GSeqHVMpSk4BLCvOnd4peFMrW7eNEKXKDcJcJz5SsXiEGX2vwrwMYy5kgrBKsvDUL0TtANTGqEnEl9AhCaCh4gLqkyqnENIxcujY3T8HMU4BvEtMlumG53kqBhbqn4F+RyLAOkuIt1tzqxGLxfhjGiBCaqYVUowkMFypsQyxa43ze/QyDJtup3YtgTgoRmeHebplP4kC/PcAOcVOLfA1xX42gJfVeArG8Q6q6ouKIrMPGXxA2wR34Pgd5kNizOsMg0tcFGBCwucVeDMAs8q8MwCTyvw1AKBMUOdtpgsUYUQCjRQHyQ2gixNZhSuw0gaXWU7P1qLABW6b6Nh4axeiyKQicMtHGxcg6u/1dg1T9Ux1qXP3xXMPSn4a7/eLcp+tOs/xIzLPNzd5R5Qp1tOc42XSvX3nHB0FGWp1plC/Js7/gpWrLvkA8PTLPUiXk8KBk170AJuoPU4GdPHvx5kDt53ZzMA3txm9G0bT7xK/sYbRXzF2O81+cw1jsGI8b7e6HbMTYTzDCU3WFd8rl0FtJK3AlMAFjqaVCAj6g1Y3X8dZySCNTrdpCi04KyGD3XbzPzgrKb2dVtvdN6REMF1ef6aUVQ3/J0YCZt2b8C130Bt0cptRrQfp4Z6/hlGtR7+z3zVlaEa1Hq32fqdpng0eqHschCmijG3vu5ny0Lt9q48M/o71/e8a97q9uS0ue2tzs/tTef7ZQ3fs352vnGeK4Ts95rxwjpwThzg/O384fzp/1a1X2q/1n4rqPfvlT5fOXdG7fd/AbRWOiY=</latexit> Identifying contextual processing Use the change in input sensitivity as a measure of contextual processing 1000 Count 500 Modi�er token� 0 10 −4 10 −3 10 −2 10 −1 10 0 || ∆ J inp || F Change in Input Jacobian (||ΔJ inp || F )

<latexit sha1_base64="pZ/myz1EV1gBoP37+hKey50Iz3k=">AI5XicfVb9s2Fa7S13vlm5PxV6IBQE6YMgs392nNpehNZosa5o0QOQaFHVsE6FEgaJip4qwX7C3Ya972uv2sj+zfzPSUqyE9EZAEHW+71y+Q1L0Y0YT2Wj8c+/+Bx9+9PGD2sP6J59+9vkXG4+PE14KgicEM64OPNxAoxGcCKpZHAWC8Chz+Ctf7Gr8beXIBLKozfyKoZRiKcRnVCpTKNx5fX3t7wCRG3pBG8fX12JOwkNkP+Xhjs7HdWA5kT9xysumU42j86MHfXsBJGkIkCcNJcu42YjnKsJCUMjrXpAjMkFnsK5mkY4hGSULTXkaEtZAjThQj2REvrbY8Mh0lyFfqKGWI5S0xMG9dh56mc9EeZ0pZKiEiRaJIyJDnSDUEBFUAku1ITARVtSIywITqdpWr9e30EkCSM5AFcYn9NoinS3l7VqM42UE2bIV9YAlc1GSeqHVMpSk4BLCvOnd4peFMrW7eNEKXKDcJcJz5SsXiEGX2vwrwMYy5kgrBKsvDUL0TtANTGqEnEl9AhCaCh4gLqkyqnENIxcujY3T8HMU4BvEtMlumG53kqBhbqn4F+RyLAOkuIt1tzqxGLxfhjGiBCaqYVUowkMFypsQyxa43ze/QyDJtup3YtgTgoRmeHebplP4kC/PcAOcVOLfA1xX42gJfVeArG8Q6q6ouKIrMPGXxA2wR34Pgd5kNizOsMg0tcFGBCwucVeDMAs8q8MwCTyvw1AKBMUOdtpgsUYUQCjRQHyQ2gixNZhSuw0gaXWU7P1qLABW6b6Nh4axeiyKQicMtHGxcg6u/1dg1T9Ux1qXP3xXMPSn4a7/eLcp+tOs/xIzLPNzd5R5Qp1tOc42XSvX3nHB0FGWp1plC/Js7/gpWrLvkA8PTLPUiXk8KBk170AJuoPU4GdPHvx5kDt53ZzMA3txm9G0bT7xK/sYbRXzF2O81+cw1jsGI8b7e6HbMTYTzDCU3WFd8rl0FtJK3AlMAFjqaVCAj6g1Y3X8dZySCNTrdpCi04KyGD3XbzPzgrKb2dVtvdN6REMF1ef6aUVQ3/J0YCZt2b8C130Bt0cptRrQfp4Z6/hlGtR7+z3zVlaEa1Hq32fqdpng0eqHschCmijG3vu5ny0Lt9q48M/o71/e8a97q9uS0ue2tzs/tTef7ZQ3fs352vnGeK4Ts95rxwjpwThzg/O384fzp/1a1X2q/1n4rqPfvlT5fOXdG7fd/AbRWOiY=</latexit> Identifying contextual processing Allows us to identify modifier inputs 1000 Count 500 Modi�er token� 0 10 −4 10 −3 10 −2 10 −1 10 0 || ∆ J inp || F Change in Input Jacobian (||ΔJ inp || F )

Modifier subspace 3 2 Modi�er componen� �1 1 0 −1 −2 −3 −2 −1 0 1 2 Modi�er componen� �2

Modifier subspace Change in Input Jacobian (||ΔJ inp || F ) 3 0.30 2 Modi�er component �1 0.25 1 0.20 0 0.15 −1 0.10 −2 0.05 −3 0.00 −2 −1 0 1 2 Modi�er component �2

Modifier subspace Change in Input Jacobian (||ΔJ inp || F ) 3 0.30 2 Modi�er component �1 0.25 e�treme�� 1 0.20 0 0.15 −1 0.10 −2 0.05 not −3 0.00 −2 −1 0 1 2 Modi�er component �2

Modifier subspace Change in Input Jacobian (||ΔJ inp || F ) 3 0.30 �er� 2 Modi�er component �1 0.25 e�treme�� e 1 0.20 0 but 0.15 −1 0.10 �ero −2 ne�er 0.05 not −3 0.00 −2 −1 0 1 2 Modi�er component �2

Modifier dynamics �o�i�er component �� 2 � 0 −� −2 −� −4 −2 0 24 Principal component ��

Modifier dynamics �o�i�er component �� 2 � 0 −� −2 not −� −4 −2 0 24 Principal component ��

Modifier dynamics �o�i�er com�onent �� 2 extremely � 0 −� −2 not −� −4 −2 0 24 Princi��l com�onent ��

Modifier dynamics (a) (b) Modifjer component #1 3 �o�i�er com�onent �� 3 � 2 Distance from line attractor "not" extremely 1 2 2 � � 2�� to�ens 0 extremely � "extremely" −1 1 � � 1�� to�ens 0 −2 not −� 0 −3 (a) (b) 0 5 10 −4 −2 0 2 4 −2 Time (t) Principal component #1 not 3 −� Distance from line attractor "not" −4 −2 0 24 2 � � 2�� to�ens Princi��l com�onent �� "extremely" 1 � � 1�� to�ens 0 0 5 10 Time (t)

How recurrent networks implement contextual processing in sentiment - PowerPoint PPT Presentation

How recurrent networks implement contextual processing in sentiment analysis Niru Maheswaranathan and David Sussillo Google Research ICML 2020 @niru_m Sentiment classification using RNNs Sentiment classification using RNNs That restaurant

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Contextual Analysis SWEN-444 Contextual analysis Systematic analysis of contextual user work

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Contextual Advertising: Contextual Advertising: Semantic Approach Semantic Approach Ekaterina

Experimental Design & Evaluation 4. Contextual Inquiry SunyoungKim,PhD Contextual

Serving Contextual Communities Serving Contextual Communities The Evangelical Theological

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

Cyber-Physical Event Processing Chao Wang CSE 520S References Core material of this lecture:

Internet measurement and the impact of big data Kenjiro Cho (IIJ/WIDE) Big Data everywhere

Frontiers at the interface of High Performance Computing Deep Learning and Multimessenger

Toward timely, predictable and cost-effective data analytics Renata Borovica-Gaji DIAS, EPFL

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 3: Parallel and Scalable Data

Texas Tech University Jialin Liu, Yong Chen , Suren Byna September 1 st , 2015 P2S2 2015 Outline

Lecture 9: Data Abstraction Marvin Zhang 07/05/2016 Announcements Roadmap Introduction

Topic 5 Reminder: primitive expressions, means of Data Abstraction combination, means of

How recurrent networks implement contextual processing in sentiment - PowerPoint PPT Presentation

How recurrent networks implement contextual processing in sentiment analysis Niru Maheswaranathan and David Sussillo Google Research ICML 2020 @niru_m Sentiment classification using RNNs Sentiment classification using RNNs That restaurant

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Contextual Analysis SWEN-444 Contextual analysis Systematic analysis of contextual user work

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Contextual Advertising: Contextual Advertising: Semantic Approach Semantic Approach Ekaterina

Experimental Design &amp; Evaluation 4. Contextual Inquiry SunyoungKim,PhD Contextual

Serving Contextual Communities Serving Contextual Communities The Evangelical Theological

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

Cyber-Physical Event Processing Chao Wang CSE 520S References Core material of this lecture:

Internet measurement and the impact of big data Kenjiro Cho (IIJ/WIDE) Big Data everywhere

Frontiers at the interface of High Performance Computing Deep Learning and Multimessenger

Toward timely, predictable and cost-effective data analytics Renata Borovica-Gaji DIAS, EPFL

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 3: Parallel and Scalable Data

Texas Tech University Jialin Liu, Yong Chen , Suren Byna September 1 st , 2015 P2S2 2015 Outline

Lecture 9: Data Abstraction Marvin Zhang 07/05/2016 Announcements Roadmap Introduction

Topic 5 Reminder: primitive expressions, means of Data Abstraction combination, means of

Experimental Design & Evaluation 4. Contextual Inquiry SunyoungKim,PhD Contextual