Low-Dimensional Dynamics of Encoding and Learning in Recurrent - PowerPoint PPT Presentation

Low-Dimensional Dynamics of Encoding and Learning in Recurrent Neural Networks Stefan Horoi 1,2 , Victor Geadah 1,2 , Guy Wolf 1,2,* and Guillaume Lajoie 1,2,* * Equal senior authorship contributions 1 Department of Mathematics and Statistics 2 Mila – Quebec Artificial Intelligence Institute Université de Montréal 33rd Canadian Conference on Artificial Intelligence – 12-15 May 2020

Understanding RNNs as dynamical systems Intra-layer connections RNNs can be analysed as nonlinear dynamical systems Preserves information across timesteps Well-suited for the analysis of sequential (Sussillo, 2012) data (Poole, 2016) (Pascanu, 2012) We need to analyse the network's internal representations. 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Geometry of internal representations (Marquez, 2018) In RNNs, the internal dynamics are linked to the internal (Sussillo, 2012) representations geometry In DNNs, the geometry of internal representations was linked to classification accuracy Further relations between geometry of internal states and performance have not been established in RNNs (Cohen, 2020) We analyse the geometry of internal states and the network dynamics in parallel. 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Experimental setup Task: Sequential MNIST classification Model: 28 pixels / line 28 lines Weight matrices: Input Output Recurrent layer layer layer # neurons: 28 200 10 tanh linear 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Training and implementation • Implementation: PyTorch • Optimizer: Adam • Loss function: Cross-entropy • Number of Epochs: 30 • Used the training dataset • Performance: 93% 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Experimental validation datasets Original image Hidden image Cut image Extended image n lines 0 ≤ n ≤ 28 • Ten networks were trained on the original training dataset and were tested on the modified versions of the validation datasets. 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Effect of sequence length on classification accuracy (1) Cut images Extended images 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Effect of sequence length on classification accuracy (2) Regardless of the amount of information given, the sequence length is the main factor determining classification precision. Cut images Hidden images 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Development of a task relevant structure Class (digit) coloured clustering of internal representations at three different timesteps using t-SNE: Some structure begins Digits are in separate Clusters begin to clusters to form degrade Task relevant structure is developed as soon as information is provided. 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Classifying the RNNs as dynamical systems The autonomous dynamical system associated with our network equation: The spectrum of Lyapunov exponents can be used to (Crisanti, 2018) classify the geometry of the system's attractors: (Marquez, 2018) Largest (norm) eigenvalue MLE: Jacobian of h(t) λ 0 > 0 Exponential volume expansion – Initially close trajectories will diverge with time λ 0 = 0 Volume preservation – Trajectories start showing periodicity After training: λ 0 < 0 Exponential volume compression – All trajectories converge to fixed points Before training: 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Formation of limit cycles Digits are in separate Clusters degrade into Trajectories settle into clusters two limit cycles two limit cycles 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Conclusion • Different perspectives on how information is processed in standard RNNs; • Information is kept as multi-dimensional clusters corresponding to different classes; • The information is not interpretable by the final layer unless the sequence length of the input is the same as for the training images; • As the dynamics settle the internal representations are compressed into a non- trivial attractor; • The attractor is composed of two limit cycles of intrinsic dimension far smaller than the space; 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Future work Short term • We hypothesize that information about digit classes might be encoded as phases on the limit cycles (preliminary results); Long term • All recurrent models (RNNs, LSTMs, GRUs, etc.) are defined by their dynamics; • We expect the same analytical framework to be effective in less artificial scenarios; • Extend analysis to tasks that have intrinsic sequential structure such as HAR or NLP. 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Acknowledgments We would like to thank Aude Forcione-Lambert and Giancarlo Kerg for useful discussions. The work was partially funded by the following organisations: 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Low-Dimensional Dynamics of Encoding and Learning in Recurrent - PowerPoint PPT Presentation

Low-Dimensional Dynamics of Encoding and Learning in Recurrent Neural Networks Stefan Horoi 1,2 , Victor Geadah 1,2 , Guy Wolf 1,2,* and Guillaume Lajoie 1,2,* * Equal senior authorship contributions 1 Department of Mathematics and Statistics 2

61A Extra Lecture 4 Announcements Encoding Strings Representing Strings: UTF-8 Encoding 4

Deep Encode: Machine Learning for Per-Title Encoding Daniel Silhavy| IBC20| Per-Title Encoding

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

Quantum Tunneling and Magnetization Dynamics Quantum Tunneling and Magnetization Dynamics in Low

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning encoding and transfer

Sampling low-dimensional Markovian dynamics for learning certified reduced models from data Wayne

Memory Chapter 7 Encoding, Storage and Retrieval of Memor y Encoding Storage

1 Comparison of Encoding Comparison of Encoding Schemes (1) Schemes (2) Signal Spectrum

Instruction Encoding CSE378 W INTER , 2001 63 Introduction MIPS Encoding Remember that in a

HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, NVIDIA AGENDA GPU Video

Overview Understanding the neural code Neural Encoding Encoding: Prediction of neural response to

Part II Video General Concepts MPEG1 encoding MPEG2

Typing-by-encoding A reductionistic approach to building type systems Fran cois Pottier

CS137: Today Electronic Design Automation Encoding Input Output Day 7: October

Statistical models for neural encoding, decoding, and optimal stimulus design Liam Paninski

Planning Introduction Early works Significance Formalizations Encoding Planning as an Instance

B5.1 Introduction Heuristics Constraints Planning Explicit MDPs Probabilistic Factored MDPs

BATS: An efficient network coding solution for packet loss networks Raymond W. Yeung Institute

ECEU530 Schedule ECE U530 Homework 4 due Wednesday, October 23 Digital Hardware Synthesis

A Structured VHDL Design Method Jiri Gaisler CTH / Gaisler Research Outline

DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation

Retiming and Resynthesis with Sweep Are Complete for Sequential Transformations Hai Zhou EECS