SLIDE 1 Complementary Learning Systems in Natural and Artificial Intelligence
James L. McClelland
Department of Psychology & Center for Mind, Brain and Computation
Stanford University
SLIDE 2 Tom’s questions for me
- What sort of NN architectures could serve an
automated programmer in constructing a program?
- How do you imagine different memory
systems working in a human programmer?
SLIDE 3 Outline for the session
- Complementary learning systems
– The basic theory – Rapid schema consistent learning – Comparison of the two learning systems
- Deep learning and complementary learning
systems – Rehearsal buffer in the DQN – Memory based parameter adaptation
- Revisiting Tom’s prompt and a response
SLIDE 4 Your knowledge is in your connections!
- An experience is a pattern of activation
- ver neurons in one or more brain
regions.
- The trace left in memory is the set of
adjustments to the strengths of the connections. – Each experience leaves such a trace, but the traces are not separable or distinct. – Rather, they are superimposed in the same set of connection weights.
- Recall involves the recreation of a
pattern of activation, using a part or associate of it as a cue.
- The reinstatement depends on the
knowledge in the connection weights, which in general will reflect influences
- f many different experiences.
- Thus, memory is always a constructive
process, dependent on contributions from many different experiences.
SLIDE 5 Effect of a Hippocampal Lesions
- Intact performance on tests of
intelligence, general knowledge, language, other acquired skills
- Dramatic deficits in formation of
some types of new memories: – Explicit memories for episodes and events – Paired associate learning – Arbitrary new factual information
acquisition
- Temporally graded retrograde
amnesia: – lesion impairs recent memories leaving remote memories intact.
Note: HM’s lesion was bilateral
SLIDE 6
SLIDE 7 Key Points
- We learn about the general pattern of experiences,
not just specific things
- Gradual learning in the cortex builds implicit
semantic and procedural knowledge that forms much of the basis of our cognitive abilities
- The Hippocampal system complements the cortex
by allowing us to learn specific things without interference with existing structured knowledge
- In general these systems must be thought of as
working together rather than being alternative sources of information.
SLIDE 8 Effect of Prior Association on Paired-Associate Learning in Control and Amnesic Populations
Cutting (1978), Expt. 1
20 40 60 80 100 Very Easy Easy Fairly Easy Hard Very Hard
Category (Ease of Association) Percent Correct
Control (Expt) Amnesic (Expt)
Base rates
SLIDE 9 Kwok & McClelland Model of Semantic and Episodic Memory
- Model includes slow learning cortical
system and a fast-learning hippocampal system.
- Cortex contains units representing both
content and context of an experience.
- Semantic memory is gradually built up
through repeated presentations of the same content in different contexts.
- Formation of new episodic memory
depends on hippocampus and the relevant cortical areas, including context.
– Loss of hippocampus would prevent initial rapid binding of content and context.
- Episodic memories benefit from prior
cortical learning when they involve meaningful materials.
Context Relation Cue Target
Neo-Cortex Hippocampus
SLIDE 10 Simulation Results From KM Model
Cutting (1978), Expt. 1
84 70 9 68
20 40 60 80 100 Very Easy Easy Fairly Easy Hard Very Hard
Category (Ease of Association) Percent Correct
Control (Model) Amnesic (Model) Control (Expt) Amnesic (Expt) Base rates in model
SLIDE 11 Emergence of Meaning in Learned Distributed Representations through Gradual Interleaved Learning
- Distributed representations (what ML calls
embeddings) that capture aspects of meaning emerge through a gradual learning process
- The progression of learning and the representations
formed capture many aspects of cognitive development
- Progressive differentiation
– Sensitivity to coherent covariation across contexts – Reorganization of conceptual knowledge
SLIDE 12
SLIDE 13
The Rumelhart Model
SLIDE 14 The Training Data:
All propositions true of items at the bottom level
Robin can {grow, move, fly}
SLIDE 15
SLIDE 16
E x p e r i e n c e Early Later Later Still
SLIDE 17
What happens in this system if we try to learn something new?
Such as a Penguin
SLIDE 18 Learning Something New
- Used network already trained
with eight items and their properties.
fully connected to the representation layer
the following pairs of items: – penguin-isa living thing-animal-bird – penguin-can grow-move-swim
SLIDE 19
Rapid Learning Leads to Catastrophic Interference
SLIDE 20
A Complementary Learning System in the Medial Temporal Lobes
color form motion action valance
Temporal pole
name Medial Temporal Lobe
SLIDE 21
Avoiding Catastrophic Interference with Interleaved Learning
SLIDE 22
Initial Storage in the Hippocampus Followed by Repeated Replay Leads to the Consolidation of New Learning in Neocortex, Avoiding Catastrophic Interference color form motion action valance
Temporal pole
name Medial Temporal Lobe
SLIDE 23
Rapid Consolidation of Schema Consistent Information
Richard Morris
SLIDE 24
Tse et al (Science, 2007, 2011)
During training, 2 wells uncovered on each trial
SLIDE 25 Schemata and Schema Consistent Information
– An organized knowledge structure into which existing knowledge is organized.
- What is schema consistent
information? – Information that can be added to a schema without disturbing it.
– Partially consistent – Partially inconsistent
– a trout – a cardinal
SLIDE 26 New Simulations
- Initial training with eight
items and their properties as before.
fully connected to the representation layer also as before
- Trained the network on one
- f the following pairs of
items: – penguin-isa & penguin-can – trout-isa & trout-can – cardinal-isa & cardinal-can
SLIDE 27
New Learning of Consistent and Partially Inconsistent Information
INTERFERENCE LEARNING
SLIDE 28
Connection Weight Changes after Simulated NPA, OPA and NM Analogs
Tse Et al 2011
SLIDE 29
How Does It Work?
SLIDE 30
How Does It Work?
SLIDE 31
Comparison of the two learning systems
SLIDE 32 Dense vs Sparse Coding
– Sparse random conjunctive coding
SLIDE 33
Similarity Based Representations in Cortex
SLIDE 34 In more detail…
- Input from neocortex comes into EC;
EC projects to DG, CA3, and CA1
- Drastic pattern separation occurs in
DG
- Downsampling in CA3 assigns an
arbitrary code
- Invertable somewhat sparsified
representation in CA1
- Fewish-shot learning in DG, CA3, CA3-
>CA1 allows reconstruction of ERC pattern from partial input.
- Other connections shown in black are
part of the slow-learning neocortical network.
- Recurrence within CA3, through the
hippocampal circuit shown, and through the outer loop also involving the rest of the neocortex
SLIDE 35 Two modes of generalization
based
embeddings are already known, these modes can both support generalization
so without requiring interleaved learning
may be relatively hard
Context Relation Cue Target
Neo-Cortex Hippocampus
SLIDE 36 How might hippocampus support inference and generalization?
‘Inference’
the transitive inference task
SLIDE 37 Complementary Learning Systems in AI
SLIDE 38 Tom’s questions for me
- What sort of NN architectures could serve an
automated programmer in constructing a program?
- How do you imagine different memory
systems working in a human programmer?
- My version of the question:
– What additional form of memory do intelligent agent’s need?
SLIDE 39 Working Memory
- Is there a special working memory system in
the brain?
- Or do we learn connection weights that
sustain information an active state in memory?
- RNNs and LSTMs provide forms of working
memory
- What is exciting about these models is that
they learn what to retain – We learn to retain the information that will be useful later
SLIDE 40
The Differentiable Neural Computer
SLIDE 41
Learning what to store – in two senses
SLIDE 42
Memory Augmented Neural Networks
Santoro et al (2016) One-shot learning with MANNs
SLIDE 43 Some closing comments
- Cognitive Science, Neuroscience, and AI now have increasingly
powerful ideas that we can use to help us understand learning and memory
- AI has expanded the space of what we can consider to be learned
rather than innate
- But currently, AI breakthroughs are drastically over-
compartmentalized
- We can use meta-learning to teach a neural network just about
anything
- But there’s little generalization outside of a limited meta-task
space
- And there’s very little fully integrative work going on, allowing a
single integrated learner to acquire a range of skills all of which can be brought together to solve the problem of general artificial intelligence