Danijar Hafner1, Alex Irpan1, James Davidson1, Nicolas Heess2
1 Google Brain, 2 DeepMind
Learning Hierarchical Information Flow with Recurrent Neural Modules - - PowerPoint PPT Presentation
Learning Hierarchical Information Flow with Recurrent Neural Modules Danijar Hafner 1 , Alex Irpan 1 , James Davidson 1 , Nicolas Heess 2 1 Google Brain, 2 DeepMind NIPS 2017 #3374 1. Contribution Brain-inspired modular sequence model
1 Google Brain, 2 DeepMind
Brain-inspired modular sequence model
Learn connectivity rather than manually defining layer structure for the task. Learns skip-connections, feedback loops, and discovers novel connectivity patterns.
Brain-inspired modular sequence model
Learn connectivity rather than manually defining layer structure for the task. Learns skip-connections, feedback loops, and discovers novel connectivity patterns.
Brain-inspired modular sequence model
Learn connectivity rather than manually defining layer structure for the task. Learns skip-connections, feedback loops, and discovers novel connectivity patterns.
Neocortex often described as hierarchy but there are many side-connections and feedback loops: Areas communicate both directly and indirectly via the thalamus. We focus on the latter here. Modules communicating via a routing center include hierarchy as a special case.
Figure adapted from: Gross et al. 1993. Inferior temporal cortex as a pattern recognition device. V1 V2 V3 MT MST, FST V4 TEO IT PR, PH ER A
User udaix, Shutterstock.
Oh et al. A mesoscale connectome of the mouse brain. Nature. 2014. Figure 6.
Multiple recurrent modules share their features via a routing center.
TASK INPUT TASK OUTPUT CENTER MODULE C MODULE A MODULE D MODULE B
The center concatenates the features and lets modules read from it at the next time step.
CENTER CENTER x2 y2 MODULE B MODULE C x3 y3 MODULE A x1 y1 MODULE B MODULE A MODULE C MODULE B MODULE C MODULE A
The center concatenates the features and lets modules read from it at the next time step.
CENTER CENTER x2 y2 MODULE B MODULE C x3 y3 MODULE A x1 y1 MODULE B MODULE A MODULE C MODULE B MODULE C MODULE A
The center concatenates the features and lets modules read from it at the next time step.
CENTER CENTER x2 y2 MODULE B MODULE C x3 y3 MODULE A x1 y1 MODULE B MODULE A MODULE C MODULE B MODULE C MODULE A
Reading mechanisms can be static or dynamic, allowing locations to change.
CENTER MODULE A CONTEXT TASK OUTPUT TASK INPUT × =
A
B C D
Linear reading
Weight normalization
Fast softmax
Fast Gaussian
CENTER MODULE A CONTEXT TASK OUTPUT TASK INPUT
× =
A
B C D
Linear reading Weight normalization Fast softmax ` Fast Gaussian
CENTER MODULE A CONTEXT TASK OUTPUT TASK INPUT
× =
A
B C D
Reading mechanisms can be static or dynamic, allowing locations to change.
CENTER MODULE A CONTEXT TASK OUTPUT TASK INPUT × =
A
B C D
Reading mechanisms can be static or dynamic, allowing locations to change.
CENTER MODULE A CONTEXT TASK OUTPUT TASK INPUT × =
A
B C D
A B C D A B C D A B C D
A B C D A B C D A B C D
A B C D A B C D A B C D
x y feedback connection skip connection x y skip connection feedback connection
A B C D A B C D A B C D
skip connection
C B D A C B D A
x y feedback connection skip connection x y skip connection feedback connection
A B C D A B C D A B C D
skip connection
C B D A C B D A
ThalNet learns hierarchical information flow, skip-connections, and long feedback loops. Hierarchical connections are known from feed-forward neural networks. Skip-connections are known from ResNet architectures. Long feedback loops could be beneficial for recurrent machine learning models.
ThalNet learns hierarchical information flow, skip-connections, and long feedback loops. Hierarchical connections are known from feed-forward neural networks. Skip-connections are known from ResNet architectures. Long feedback loops could be beneficial for recurrent machine learning models. Similar connectivity is learned for the same task.
ThalNet learns hierarchical information flow, skip-connections, and long feedback loops. Hierarchical connections are known from feed-forward neural networks. Skip-connections are known from ResNet architectures. Long feedback loops could be beneficial for recurrent machine learning models. Similar connectivity is learned for the same task. Static weight normalized reading is fast and performs well. Fast reading mechanisms can be explored further in the future.
Outperforms stacked GRU in test performance on several sequential tasks.
Outperforms stacked GRU in test performance on several sequential tasks.
Outperforms stacked GRU in test performance on several sequential tasks.
Outperforms stacked GRU in test performance on several sequential tasks.
Brain-inspired modular sequence model
Modularity and reading bottleneck regularize the model and improve generalization. Other recurrent models might benefit from long feedback loops learned by ThalNet. Provides framework for multi-task learning and
Project page: https://danijar.com/thalnet Contact: mail@danijar.com
Almost no connection pattern visible Similar performance on MNIST, slightly worse on text8 (fewer parameters)
Selection based on softmax mask computed as function of module features Too many parameters to compute fast weights as activations!
1 2 3 x y
Hierarchical information flow with feedback cycles and skip connections emerges Slightly worse performance than linear mapping
1 2 3 x y 1 2 3 x y feedback weight skip connection + feedback weight
Very few parameters, can afford fast weights again Experimented with soft kernel and sampled version I couldn't make this work well, tips appreciated
MNIST 4 x FF10-GRU10 text8 4 x FF32-GRU32
Forms similar connection patterns on same task Clearer read boundaries on text8 (larger task) than on MNIST