Low-Dimensional Dynamics of Encoding and Learning in Recurrent - - PowerPoint PPT Presentation

low dimensional dynamics of encoding and learning in
SMART_READER_LITE
LIVE PREVIEW

Low-Dimensional Dynamics of Encoding and Learning in Recurrent - - PowerPoint PPT Presentation

Low-Dimensional Dynamics of Encoding and Learning in Recurrent Neural Networks Stefan Horoi 1,2 , Victor Geadah 1,2 , Guy Wolf 1,2,* and Guillaume Lajoie 1,2,* * Equal senior authorship contributions 1 Department of Mathematics and Statistics 2


slide-1
SLIDE 1

Low-Dimensional Dynamics of Encoding and Learning in Recurrent Neural Networks

Stefan Horoi1,2, Victor Geadah1,2, Guy Wolf1,2,* and Guillaume Lajoie1,2,*

33rd Canadian Conference on Artificial Intelligence – 12-15 May 2020

1 Department of Mathematics and Statistics

Université de Montréal

2 Mila – Quebec Artificial Intelligence Institute

* Equal senior authorship contributions

slide-2
SLIDE 2

Understanding RNNs as dynamical systems

33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Intra-layer connections Preserves information across timesteps Well-suited for the analysis of sequential data

(Pascanu, 2012)

We need to analyse the network's internal representations. RNNs can be analysed as nonlinear dynamical systems

(Sussillo, 2012) (Poole, 2016)

slide-3
SLIDE 3

Geometry of internal representations

33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

In RNNs, the internal dynamics are linked to the internal representations geometry

(Marquez, 2018) (Sussillo, 2012)

We analyse the geometry of internal states and the network dynamics in parallel. In DNNs, the geometry of internal representations was linked to classification accuracy

(Cohen, 2020)

Further relations between geometry of internal states and performance have not been established in RNNs

slide-4
SLIDE 4

28 pixels / line 28 lines

Experimental setup

33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Input layer Recurrent layer Output layer

Task: Sequential MNIST classification

28 200 tanh # neurons: 10 linear

Model: Weight matrices:

slide-5
SLIDE 5

Training and implementation

33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

  • Implementation: PyTorch
  • Optimizer: Adam
  • Loss function: Cross-entropy
  • Number of Epochs: 30
  • Used the training dataset
  • Performance: 93%
slide-6
SLIDE 6

Experimental validation datasets

33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Original image Cut image Extended image Hidden image

n lines 0 ≤ n ≤ 28

  • Ten networks were trained on the original training dataset and

were tested on the modified versions of the validation datasets.

slide-7
SLIDE 7

Effect of sequence length on classification accuracy (1)

33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Cut images Extended images

slide-8
SLIDE 8

Effect of sequence length on classification accuracy (2)

33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Regardless of the amount of information given, the sequence length is the main factor determining classification precision.

Hidden images Cut images

slide-9
SLIDE 9

Development of a task relevant structure

33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Task relevant structure is developed as soon as information is provided.

Class (digit) coloured clustering of internal representations at three different timesteps using t-SNE: Some structure begins to form Digits are in separate clusters Clusters begin to degrade

slide-10
SLIDE 10

Classifying the RNNs as dynamical systems

33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

The autonomous dynamical system associated with our network equation:

λ0 > 0 Exponential volume expansion – Initially close trajectories will diverge with time λ0 = 0 Volume preservation – Trajectories start showing periodicity λ0 < 0 Exponential volume compression – All trajectories converge to fixed points

Before training: After training: The spectrum of Lyapunov exponents can be used to classify the geometry of the system's attractors: (Crisanti, 2018) (Marquez, 2018)

MLE:

Largest (norm) eigenvalue Jacobian of h(t)

slide-11
SLIDE 11

Formation of limit cycles

33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

Digits are in separate clusters Clusters degrade into two limit cycles Trajectories settle into two limit cycles

slide-12
SLIDE 12

Conclusion

33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

  • Different perspectives on how information is processed in standard RNNs;
  • Information is kept as multi-dimensional clusters corresponding to different classes;
  • The information is not interpretable by the final layer unless the sequence length of the

input is the same as for the training images;

  • As the dynamics settle the internal representations are compressed into a non-

trivial attractor;

  • The attractor is composed of two limit cycles of intrinsic dimension far smaller than

the space;

slide-13
SLIDE 13

Future work

33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

  • All recurrent models (RNNs, LSTMs, GRUs, etc.) are defined by their

dynamics;

  • We expect the same analytical framework to be effective in less

artificial scenarios;

  • Extend analysis to tasks that have intrinsic sequential structure such

as HAR or NLP. Short term Long term

  • We hypothesize that information about digit classes might be

encoded as phases on the limit cycles (preliminary results);

slide-14
SLIDE 14

Acknowledgments

33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020

We would like to thank Aude Forcione-Lambert and Giancarlo Kerg for useful discussions. The work was partially funded by the following organisations: