Learning Hierarchical Information Flow with Recurrent Neural Modules - - PowerPoint PPT Presentation

learning hierarchical information flow with recurrent
SMART_READER_LITE
LIVE PREVIEW

Learning Hierarchical Information Flow with Recurrent Neural Modules - - PowerPoint PPT Presentation

Learning Hierarchical Information Flow with Recurrent Neural Modules Danijar Hafner 1 , Alex Irpan 1 , James Davidson 1 , Nicolas Heess 2 1 Google Brain, 2 DeepMind NIPS 2017 #3374 1. Contribution Brain-inspired modular sequence model


slide-1
SLIDE 1

Danijar Hafner1, Alex Irpan1, James Davidson1, Nicolas Heess2

1 Google Brain, 2 DeepMind

NIPS 2017 #3374

Learning Hierarchical Information Flow with Recurrent Neural Modules

slide-2
SLIDE 2
  • 1. Contribution

Brain-inspired modular sequence model

  • utperforming stacked GRUs.

Learn connectivity rather than manually defining layer structure for the task. Learns skip-connections, feedback loops, and discovers novel connectivity patterns.

slide-3
SLIDE 3
  • 1. Contribution

Brain-inspired modular sequence model

  • utperforming stacked GRUs.

Learn connectivity rather than manually defining layer structure for the task. Learns skip-connections, feedback loops, and discovers novel connectivity patterns.

slide-4
SLIDE 4
  • 1. Contribution

Brain-inspired modular sequence model

  • utperforming stacked GRUs.

Learn connectivity rather than manually defining layer structure for the task. Learns skip-connections, feedback loops, and discovers novel connectivity patterns.

slide-5
SLIDE 5

Neocortex often described as hierarchy but there are many side-connections and feedback loops: Areas communicate both directly and indirectly via the thalamus. We focus on the latter here. Modules communicating via a routing center include hierarchy as a special case.

  • 2. Motivation

Figure adapted from: Gross et al. 1993. Inferior temporal cortex as a pattern recognition device. V1 V2 V3 MT MST, FST V4 TEO IT PR, PH ER A

slide-6
SLIDE 6

User udaix, Shutterstock.

  • 2. Motivation
slide-7
SLIDE 7

Oh et al. A mesoscale connectome of the mouse brain. Nature. 2014. Figure 6.

  • 2. Motivation
slide-8
SLIDE 8
  • 3. Method: ThalNet

Multiple recurrent modules share their features via a routing center.

TASK INPUT TASK OUTPUT CENTER MODULE C MODULE A MODULE D MODULE B

slide-9
SLIDE 9
  • 3. Method: ThalNet

The center concatenates the features and lets modules read from it at the next time step.

CENTER CENTER x2 y2 MODULE B MODULE C x3 y3 MODULE A x1 y1 MODULE B MODULE A MODULE C MODULE B MODULE C MODULE A

slide-10
SLIDE 10
  • 3. Method: ThalNet

The center concatenates the features and lets modules read from it at the next time step.

CENTER CENTER x2 y2 MODULE B MODULE C x3 y3 MODULE A x1 y1 MODULE B MODULE A MODULE C MODULE B MODULE C MODULE A

slide-11
SLIDE 11
  • 3. Method: ThalNet

The center concatenates the features and lets modules read from it at the next time step.

CENTER CENTER x2 y2 MODULE B MODULE C x3 y3 MODULE A x1 y1 MODULE B MODULE A MODULE C MODULE B MODULE C MODULE A

slide-12
SLIDE 12
  • 3. Method: ThalNet

Reading mechanisms can be static or dynamic, allowing locations to change.

CENTER MODULE A CONTEXT TASK OUTPUT TASK INPUT × =

A

B C D

slide-13
SLIDE 13
  • 3. Method: ThalNet

Linear reading

  • Can be unstable to train
  • Less interpretable reading weights

Weight normalization

  • Static reading at same location
  • Works well in practice

Fast softmax

  • Dynamic weights based on current RNN state
  • Many parameters (features x center x context)

Fast Gaussian

  • Dynamic and fewer parameters, but unstable to train

CENTER MODULE A CONTEXT TASK OUTPUT TASK INPUT

× =

A

B C D

slide-14
SLIDE 14

Linear reading Weight normalization Fast softmax ` Fast Gaussian

  • 3. Method: ThalNet

CENTER MODULE A CONTEXT TASK OUTPUT TASK INPUT

× =

A

B C D

slide-15
SLIDE 15
  • 3. Method: ThalNet

Reading mechanisms can be static or dynamic, allowing locations to change.

CENTER MODULE A CONTEXT TASK OUTPUT TASK INPUT × =

A

B C D

slide-16
SLIDE 16
  • 3. Method: ThalNet

Reading mechanisms can be static or dynamic, allowing locations to change.

CENTER MODULE A CONTEXT TASK OUTPUT TASK INPUT × =

A

B C D

slide-17
SLIDE 17
  • 4. Findings

A B C D A B C D A B C D

slide-18
SLIDE 18
  • 4. Findings

A B C D A B C D A B C D

slide-19
SLIDE 19
  • 4. Findings

A B C D A B C D A B C D

slide-20
SLIDE 20
  • 4. Findings

x y feedback connection skip connection x y skip connection feedback connection

A B C D A B C D A B C D

skip connection

C B D A C B D A

slide-21
SLIDE 21
  • 4. Findings

x y feedback connection skip connection x y skip connection feedback connection

A B C D A B C D A B C D

skip connection

C B D A C B D A

slide-22
SLIDE 22
  • 4. Findings

ThalNet learns hierarchical information flow, skip-connections, and long feedback loops. Hierarchical connections are known from feed-forward neural networks. Skip-connections are known from ResNet architectures. Long feedback loops could be beneficial for recurrent machine learning models.

slide-23
SLIDE 23
  • 4. Findings

ThalNet learns hierarchical information flow, skip-connections, and long feedback loops. Hierarchical connections are known from feed-forward neural networks. Skip-connections are known from ResNet architectures. Long feedback loops could be beneficial for recurrent machine learning models. Similar connectivity is learned for the same task.

slide-24
SLIDE 24
  • 4. Findings

ThalNet learns hierarchical information flow, skip-connections, and long feedback loops. Hierarchical connections are known from feed-forward neural networks. Skip-connections are known from ResNet architectures. Long feedback loops could be beneficial for recurrent machine learning models. Similar connectivity is learned for the same task. Static weight normalized reading is fast and performs well. Fast reading mechanisms can be explored further in the future.

slide-25
SLIDE 25
  • 5. Performance

Outperforms stacked GRU in test performance on several sequential tasks.

slide-26
SLIDE 26
  • 5. Performance

Outperforms stacked GRU in test performance on several sequential tasks.

slide-27
SLIDE 27
  • 5. Performance

Outperforms stacked GRU in test performance on several sequential tasks.

slide-28
SLIDE 28
  • 5. Performance

Outperforms stacked GRU in test performance on several sequential tasks.

slide-29
SLIDE 29
  • 6. Conclusion

Brain-inspired modular sequence model

  • utperforming stacked GRUs.

Modularity and reading bottleneck regularize the model and improve generalization. Other recurrent models might benefit from long feedback loops learned by ThalNet. Provides framework for multi-task learning and

  • nline architecture search.

Project page: https://danijar.com/thalnet Contact: mail@danijar.com

slide-30
SLIDE 30

Bonus: more reading masks

slide-31
SLIDE 31

Reading mechanisms: fully connected tanh layer

Almost no connection pattern visible Similar performance on MNIST, slightly worse on text8 (fewer parameters)

slide-32
SLIDE 32

Reading mechanisms: fast softmax weights

Selection based on softmax mask computed as function of module features Too many parameters to compute fast weights as activations!

slide-33
SLIDE 33

Reading mechanisms: fast softmax weights

1 2 3 x y

slide-34
SLIDE 34

Hierarchical information flow with feedback cycles and skip connections emerges Slightly worse performance than linear mapping

Reading mechanisms: softmax weights

slide-35
SLIDE 35

Reading mechanisms: softmax weights

1 2 3 x y 1 2 3 x y feedback weight skip connection + feedback weight

slide-36
SLIDE 36

Reading mechanisms: Gaussian kernel

Very few parameters, can afford fast weights again Experimented with soft kernel and sampled version I couldn't make this work well, tips appreciated

slide-37
SLIDE 37

Reading mechanisms: softmax weights

MNIST 4 x FF10-GRU10 text8 4 x FF32-GRU32

Forms similar connection patterns on same task Clearer read boundaries on text8 (larger task) than on MNIST