Learning Hierarchical Information Flow with Recurrent Neural Modules - PowerPoint PPT Presentation

Learning Hierarchical Information Flow with Recurrent Neural Modules Danijar Hafner 1 , Alex Irpan 1 , James Davidson 1 , Nicolas Heess 2 1 Google Brain, 2 DeepMind NIPS 2017 #3374

1. Contribution Brain-inspired modular sequence model outperforming stacked GRUs. Learn connectivity rather than manually defining layer structure for the task. Learns skip-connections, feedback loops, and discovers novel connectivity patterns.

2. Motivation Neocortex often described as hierarchy but there are many side-connections and feedback loops: MST, MT FST V1 V2 V4 IT A V3 TEO PR, PH ER Figure adapted from: Gross et al. 1993. Inferior temporal cortex as a pattern recognition device. Areas communicate both directly and indirectly via the thalamus. We focus on the latter here. Modules communicating via a routing center include hierarchy as a special case.

2. Motivation User udaix, Shutterstock.

2. Motivation Oh et al. A mesoscale connectome of the mouse brain. Nature. 2014. Figure 6.

3. Method: ThalNet MODULE B MODULE A MODULE D TASK TASK CENTER INPUT OUTPUT MODULE C Multiple recurrent modules share their features via a routing center.

3. Method: ThalNet x 1 x 2 x 3 MODULE A MODULE A MODULE A MODULE B CENTER MODULE B CENTER MODULE B MODULE C MODULE C MODULE C y 1 y 2 y 3 The center concatenates the features and lets modules read from it at the next time step.

3. Method: ThalNet TASK OUTPUT B TASK C CENTER INPUT D × MODULE A A = CONTEXT Reading mechanisms can be static or dynamic, allowing locations to change.

3. Method: ThalNet TASK OUTPUT Linear reading B TASK CENTER C INPUT ● Can be unstable to train × D MODULE A ● Less interpretable reading weights A Weight normalization = CONTEXT ● Static reading at same location ● Works well in practice Fast softmax ● Dynamic weights based on current RNN state ● Many parameters (features x center x context) Fast Gaussian ● Dynamic and fewer parameters, but unstable to train

3. Method: ThalNet TASK OUTPUT Linear reading B TASK CENTER C INPUT × D MODULE A A Weight normalization = CONTEXT Fast softmax ` Fast Gaussian

3. Method: ThalNet TASK OUTPUT B TASK C CENTER INPUT D × MODULE A A = CONTEXT Reading mechanisms can be static or dynamic, allowing locations to change.

4. Findings A A A B B B C C C D D D

4. Findings skip connection skip connection feedback connection A B C D x y A B C D x y skip connection feedback connection A A A B B B C C C D D D

4. Findings ThalNet learns hierarchical information flow, skip-connections, and long feedback loops. Hierarchical connections are known from feed-forward neural networks. Skip-connections are known from ResNet architectures. Long feedback loops could be beneficial for recurrent machine learning models.

4. Findings ThalNet learns hierarchical information flow, skip-connections, and long feedback loops. Hierarchical connections are known from feed-forward neural networks. Skip-connections are known from ResNet architectures. Long feedback loops could be beneficial for recurrent machine learning models. Similar connectivity is learned for the same task.

4. Findings ThalNet learns hierarchical information flow, skip-connections, and long feedback loops. Hierarchical connections are known from feed-forward neural networks. Skip-connections are known from ResNet architectures. Long feedback loops could be beneficial for recurrent machine learning models. Similar connectivity is learned for the same task. Static weight normalized reading is fast and performs well. Fast reading mechanisms can be explored further in the future.

5. Performance Outperforms stacked GRU in test performance on several sequential tasks.

6. Conclusion Brain-inspired modular sequence model outperforming stacked GRUs. Modularity and reading bottleneck regularize the model and improve generalization. Other recurrent models might benefit from long feedback loops learned by ThalNet. Provides framework for multi-task learning and online architecture search. Project page: https://danijar.com/thalnet Contact: mail@danijar.com

Bonus: more reading masks

Reading mechanisms: fully connected tanh layer Almost no connection pattern visible Similar performance on MNIST, slightly worse on text8 (fewer parameters)

Reading mechanisms: fast softmax weights Selection based on softmax mask computed as function of module features Too many parameters to compute fast weights as activations!

Reading mechanisms: fast softmax weights 1 x 0 3 y 2

Reading mechanisms: softmax weights Hierarchical information flow with feedback cycles and skip connections emerges Slightly worse performance than linear mapping

Reading mechanisms: softmax weights skip connection + feedback weight 1 x 0 3 y x 0 2 1 3 y 2 feedback weight

Reading mechanisms: Gaussian kernel Very few parameters, can afford fast weights again Experimented with soft kernel and sampled version I couldn't make this work well, tips appreciated

Reading mechanisms: MNIST text8 4 x FF10-GRU10 4 x FF32-GRU32 softmax weights Forms similar connection patterns on same task Clearer read boundaries on text8 (larger task) than on MNIST

Learning Hierarchical Information Flow with Recurrent Neural Modules - PowerPoint PPT Presentation

Learning Hierarchical Information Flow with Recurrent Neural Modules Danijar Hafner 1 , Alex Irpan 1 , James Davidson 1 , Nicolas Heess 2 1 Google Brain, 2 DeepMind NIPS 2017 #3374 1. Contribution Brain-inspired modular sequence model

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Introduction CSCE CSCE 496/896 496/896 Lecture 6: Lecture 6: Recurrent Recurrent CSCE

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Flow Visualization Overview: Flow Visualization (1) Introduction, overview Flow data Simulation

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Visualiza(on So-ware and Hardware for In-Silico Brain Research

Latent Bernoulli Autoencoder ICML 2020 Jiri Fajtl 1 , Vasileios Argyriou 1 , Dorothy Monekosso 2

Pitch Anything by Oren Klaff BUYER 3 3 Neocortex Neocortex 2 2 Mid Brain Mid Brain

Migrating plugins to standard features VimConf 2018 daisuzu About me daisuzu(Daisuke

TOPICS IN ETHICS AND IMMIGRATION LAW KURSTEN PHELPS Tahirih Justice Center Director of Legal

Fundamentals of Computational Neuroscience 2e December 31, 2009 Chapter 8: Recurrent associative

SUCCESSFUL LONG-TERM INVESTING WITH HIGH-QUALITY STOCKS FEBRUARY 15, 2020 Susan Christ, CFA

Emerging NVM Enabled Storage Architecture: From Evolution to Revolution. Yiran Chen Electrical and