SLIDE 1 Adam Marblestone CS 379c Stanford 2019 (slides based on pre-DeepMind work, much of it with Ken Hayworth)
Speculations on possible brain substrates
- f symbolic processing and structured I/O
from memory
SLIDE 2
A tentative high-level template for AI cognitive architectures, based on some interpretations of modern neuroscience (such as it is)
SLIDE 3 A tentative high-level template for AI cognitive architectures, based on some interpretations of modern neuroscience (such as it is)
????????????
But raises more questions than it answers...
SLIDE 4 Working memory: reverberating activity? qualitatively similar to ongoing activity in a LSTM?
- - but, in cortex? cortico-thalamic loops? unstructured versus pre-structured? variables/slots?
- - gating / routing of access to/from working memory
Episodic memory: rapid plasticity in hippocampus, supports pattern completion, linked to diverse cortical representations
- - many open questions… temporal, spatial, predictive and other relational organizing principles?
- - how is it consolidated into semantic memory or other cortically-encoded knowledge?
- - free association, chunking, hierarchical contexts...
- - how are memory recall, offline replay + prospective planning linked with RL?
- - interplay of feature-based generalization and sparse, arbitrary pattern-separated codes?
- - ...
Semantic memory: knowledge-graph like representations in cortical association areas?
- - distinct from episodic memory? distinct from “unstructured” cortical weights?
- - is this a distinct architecture, or something that emerges from the other systems?
Procedural memory: cortico-striatal synapses governing basal-ganglia action selection? selectable cortical programs? Other: how is the information encoded (e.g., based on which loss functions) before entering any of the above systems? are VAE-like “latent vectors” able to capture enough structure, when trained with the right loss functions, e.g., see MERLIN predictive losses? or does one need something more like “capsules” or other architectural features?
Psychological inspirations for knowledge representation in AI cognitive architectures... and assumptions to question
SLIDE 5 Neural Turing Machine:
- riginally framed as extension to LSTM “working memory”
SLIDE 6 NTM arguably solves long-standing complaints about lack
- f symbolic “variable binding” in NNs (e.g., Gary Marcus)
SLIDE 7
Can we forge tighter links with neuroscience to constrain architectural choices for working + episodic memory analogs, symbolic structures, dynamic routing, and training procedures in ANNs?
SLIDE 8
neural attractors/assemblies/ensembles
http://fourier.eng.hmc.edu/e161/lectures/figures/energylandscape.gif
(cf., Hopfield…)
SLIDE 9 (cf., Hopfield…)
https://github.com/adammarblestone/AssociativeMemories
SLIDE 10
Information represented via assemblies/attractors
SLIDE 11
Information represented via assemblies/attractors
SLIDE 12
Sequences of point attractors in the hippocampus?
SLIDE 13
Sequences of point attractors in the hippocampus?
SLIDE 14
The attractors may be in cortico-thalamo-cortical loops
SLIDE 15 Thalamic Latches and Working Memory Buffers
McFarland & Haber Murray Sherman
SLIDE 16
Assumption: Information necessary to select an assembly passes through thalamus between cortical buffers
Thalamic Latches and Working Memory Buffers
SLIDE 17 Idea: Thalamic relay + attractor implementation of “dynamically partitionable auto-associative neural network” (Hayworth 2012)
- Global attractors/assemblies/ensembles shared across source > thalamic relay >
destination buffers
- Gating the thalamic relay off allows “partitioning” of the buffers
- Gating the thalamic relay on allows information to be “copied” from a source
buffer to a destination buffer, forcing the destination buffer to occupy an attractor globally shared with that of the source
Gated communication using thalamic relay of attractors
SLIDE 18
Cortico-thalamic latched memory buffer
SLIDE 19
Cortico-thalamic latched memory buffer
Assembly/attractor/ ensemble shared across connected cortical and thalamic areas…
SLIDE 20
Hayworth and Marblestone 2018 “Copy and paste” of symbols using partitionable attractors
SLIDE 21
Hayworth 2009 “Copy and paste” of symbols using partitionable attractors
Sequence of gating operations for copy-and-paste of assemblies (cf., symbolic variable binding)
During training / symbol allocation...
SLIDE 22
Hayworth 2009 “Copy and paste” of symbols using partitionable attractors
Sequence of gating operations for copy-and-paste of assemblies (cf., symbolic variable binding)
Later, executing a routing operation...
SLIDE 23
Hayworth and Marblestone 2018 “Copy and paste” of symbols using partitionable attractors
Sequence of gating operations for copy-and-paste of assemblies (cf., symbolic variable binding)
SLIDE 24 Lisman 2015 “Copy and paste” of symbols using partitionable attractors
“Latch” and “relay” control via basal ganglia discrete outputs
- Evolutionarily ancient (homologies to simplest vertebrate brains, e.g., ZFish)
- Does RL
- BG and superior colliculus may also contain innate control structures that could drive
“training routines” / “internal curricula” / “bootstrap cost functions”... discrete inhibitory/disinhibitory control over target thalamic areas/relays/latches?
SLIDE 25
Hayworth and Marblestone 2018 Clamping in target patterns for “contrastive” learning
SLIDE 26 Clamping in target patterns for “contrastive” learning Explicit basal ganglia directed control over the learning of invariances (not just unsupervised “slow feature” finding)?
Example:
- Basal ganglia recognizes boundaries of “episode” with a given object
(BG learns this policy via reinforcement learning?)
- BG “clamps” target patterns into thalamo-cortical target buffer
- BG trains upstream sensory hierarchy to map varying input to clamped target
- Target pattern may be retrieved from memory on subsequent episode?
Hayworth and Marblestone 2018
SLIDE 27
Hayworth and Marblestone 2018 Structured I/O from an associative memory Unstructured associative code
SLIDE 28
Hayworth and Marblestone 2018 Structured I/O from an associative memory Structured representation across multiple buffers
SLIDE 29
Hayworth and Marblestone 2018
A crude, very partial, and speculative “integrative picture”
SLIDE 30
Returning to the current situation re integrated memory-based RL architectures in AI
SLIDE 31
Basically “soft attention” over a set of memory “slots”, with cosine-distance based similarity lookup…
Returning to the current situation re integrated memory-based RL architectures in AI
SLIDE 32
What about structured routing / potential thalamus analogs?
SLIDE 33
What about structured routing / potential thalamus analogs?