Complementary Learning Systems in Natural and Artificial - - PowerPoint PPT Presentation

complementary learning systems in natural and artificial
SMART_READER_LITE
LIVE PREVIEW

Complementary Learning Systems in Natural and Artificial - - PowerPoint PPT Presentation

Complementary Learning Systems in Natural and Artificial Intelligence James L. McClelland Department of Psychology & Center for Mind, Brain and Computation Stanford University Toms questions for me What sort of NN architectures


slide-1
SLIDE 1

Complementary Learning Systems in Natural and Artificial Intelligence

James L. McClelland

Department of Psychology & Center for Mind, Brain and Computation

Stanford University

slide-2
SLIDE 2

Tom’s questions for me

  • What sort of NN architectures could serve an

automated programmer in constructing a program?

  • How do you imagine different memory

systems working in a human programmer?

slide-3
SLIDE 3

Outline for the session

  • Complementary learning systems

– The basic theory – Rapid schema consistent learning – Comparison of the two learning systems

  • Deep learning and complementary learning

systems – Rehearsal buffer in the DQN – Memory based parameter adaptation

  • Revisiting Tom’s prompt and a response
slide-4
SLIDE 4

Your knowledge is in your connections!

  • An experience is a pattern of activation
  • ver neurons in one or more brain

regions.

  • The trace left in memory is the set of

adjustments to the strengths of the connections. – Each experience leaves such a trace, but the traces are not separable or distinct. – Rather, they are superimposed in the same set of connection weights.

  • Recall involves the recreation of a

pattern of activation, using a part or associate of it as a cue.

  • The reinstatement depends on the

knowledge in the connection weights, which in general will reflect influences

  • f many different experiences.
  • Thus, memory is always a constructive

process, dependent on contributions from many different experiences.

slide-5
SLIDE 5

Effect of a Hippocampal Lesions

  • Intact performance on tests of

intelligence, general knowledge, language, other acquired skills

  • Dramatic deficits in formation of

some types of new memories: – Explicit memories for episodes and events – Paired associate learning – Arbitrary new factual information

  • Spared priming and skill

acquisition

  • Temporally graded retrograde

amnesia: – lesion impairs recent memories leaving remote memories intact.

Note: HM’s lesion was bilateral

slide-6
SLIDE 6
slide-7
SLIDE 7

Key Points

  • We learn about the general pattern of experiences,

not just specific things

  • Gradual learning in the cortex builds implicit

semantic and procedural knowledge that forms much of the basis of our cognitive abilities

  • The Hippocampal system complements the cortex

by allowing us to learn specific things without interference with existing structured knowledge

  • In general these systems must be thought of as

working together rather than being alternative sources of information.

slide-8
SLIDE 8

Effect of Prior Association on Paired-Associate Learning in Control and Amnesic Populations

Cutting (1978), Expt. 1

  • 20

20 40 60 80 100 Very Easy Easy Fairly Easy Hard Very Hard

Category (Ease of Association) Percent Correct

Control (Expt) Amnesic (Expt)

Base rates

slide-9
SLIDE 9

Kwok & McClelland Model of Semantic and Episodic Memory

  • Model includes slow learning cortical

system and a fast-learning hippocampal system.

  • Cortex contains units representing both

content and context of an experience.

  • Semantic memory is gradually built up

through repeated presentations of the same content in different contexts.

  • Formation of new episodic memory

depends on hippocampus and the relevant cortical areas, including context.

– Loss of hippocampus would prevent initial rapid binding of content and context.

  • Episodic memories benefit from prior

cortical learning when they involve meaningful materials.

Context Relation Cue Target

Neo-Cortex Hippocampus

slide-10
SLIDE 10

Simulation Results From KM Model

Cutting (1978), Expt. 1

84 70 9 68

  • 20

20 40 60 80 100 Very Easy Easy Fairly Easy Hard Very Hard

Category (Ease of Association) Percent Correct

Control (Model) Amnesic (Model) Control (Expt) Amnesic (Expt) Base rates in model

slide-11
SLIDE 11

Emergence of Meaning in Learned Distributed Representations through Gradual Interleaved Learning

  • Distributed representations (what ML calls

embeddings) that capture aspects of meaning emerge through a gradual learning process

  • The progression of learning and the representations

formed capture many aspects of cognitive development

  • Progressive differentiation

– Sensitivity to coherent covariation across contexts – Reorganization of conceptual knowledge

slide-12
SLIDE 12
slide-13
SLIDE 13

The Rumelhart Model

slide-14
SLIDE 14

The Training Data:

All propositions true of items at the bottom level

  • f the tree, e.g.:

Robin can {grow, move, fly}

slide-15
SLIDE 15
slide-16
SLIDE 16

E x p e r i e n c e Early Later Later Still

slide-17
SLIDE 17

What happens in this system if we try to learn something new?

Such as a Penguin

slide-18
SLIDE 18

Learning Something New

  • Used network already trained

with eight items and their properties.

  • Added one new input unit

fully connected to the representation layer

  • Trained the network with

the following pairs of items: – penguin-isa living thing-animal-bird – penguin-can grow-move-swim

slide-19
SLIDE 19

Rapid Learning Leads to Catastrophic Interference

slide-20
SLIDE 20

A Complementary Learning System in the Medial Temporal Lobes

color form motion action valance

Temporal pole

name Medial Temporal Lobe

slide-21
SLIDE 21

Avoiding Catastrophic Interference with Interleaved Learning

slide-22
SLIDE 22

Initial Storage in the Hippocampus Followed by Repeated Replay Leads to the Consolidation of New Learning in Neocortex, Avoiding Catastrophic Interference color form motion action valance

Temporal pole

name Medial Temporal Lobe

slide-23
SLIDE 23

Rapid Consolidation of Schema Consistent Information

Richard Morris

slide-24
SLIDE 24

Tse et al (Science, 2007, 2011)

During training, 2 wells uncovered on each trial

slide-25
SLIDE 25

Schemata and Schema Consistent Information

  • What is a ‘schema’?

– An organized knowledge structure into which existing knowledge is organized.

  • What is schema consistent

information? – Information that can be added to a schema without disturbing it.

  • What about a penguin?

– Partially consistent – Partially inconsistent

  • In contrast, consider

– a trout – a cardinal

slide-26
SLIDE 26

New Simulations

  • Initial training with eight

items and their properties as before.

  • Added one new input unit

fully connected to the representation layer also as before

  • Trained the network on one
  • f the following pairs of

items: – penguin-isa & penguin-can – trout-isa & trout-can – cardinal-isa & cardinal-can

slide-27
SLIDE 27

New Learning of Consistent and Partially Inconsistent Information

INTERFERENCE LEARNING

slide-28
SLIDE 28

Connection Weight Changes after Simulated NPA, OPA and NM Analogs

Tse Et al 2011

slide-29
SLIDE 29

How Does It Work?

slide-30
SLIDE 30

How Does It Work?

slide-31
SLIDE 31

Comparison of the two learning systems

slide-32
SLIDE 32

Dense vs Sparse Coding

  • Pattern separation:

– Sparse random conjunctive coding

slide-33
SLIDE 33

Similarity Based Representations in Cortex

slide-34
SLIDE 34

In more detail…

  • Input from neocortex comes into EC;

EC projects to DG, CA3, and CA1

  • Drastic pattern separation occurs in

DG

  • Downsampling in CA3 assigns an

arbitrary code

  • Invertable somewhat sparsified

representation in CA1

  • Fewish-shot learning in DG, CA3, CA3-

>CA1 allows reconstruction of ERC pattern from partial input.

  • Other connections shown in black are

part of the slow-learning neocortical network.

  • Recurrence within CA3, through the

hippocampal circuit shown, and through the outer loop also involving the rest of the neocortex

slide-35
SLIDE 35

Two modes of generalization

  • Parametric vs. Item-

based

  • As long as the

embeddings are already known, these modes can both support generalization

  • The hippocampus can do

so without requiring interleaved learning

  • Adapting the embeddings

may be relatively hard

Context Relation Cue Target

Neo-Cortex Hippocampus

slide-36
SLIDE 36

How might hippocampus support inference and generalization?

‘Inference’

  • Finding missing links in

the transitive inference task

slide-37
SLIDE 37

Complementary Learning Systems in AI

  • DQN
  • MBPA
slide-38
SLIDE 38

Tom’s questions for me

  • What sort of NN architectures could serve an

automated programmer in constructing a program?

  • How do you imagine different memory

systems working in a human programmer?

  • My version of the question:

– What additional form of memory do intelligent agent’s need?

slide-39
SLIDE 39

Working Memory

  • Is there a special working memory system in

the brain?

  • Or do we learn connection weights that

sustain information an active state in memory?

  • RNNs and LSTMs provide forms of working

memory

  • What is exciting about these models is that

they learn what to retain – We learn to retain the information that will be useful later

slide-40
SLIDE 40

The Differentiable Neural Computer

slide-41
SLIDE 41

Learning what to store – in two senses

slide-42
SLIDE 42

Memory Augmented Neural Networks

Santoro et al (2016) One-shot learning with MANNs

slide-43
SLIDE 43

Some closing comments

  • Cognitive Science, Neuroscience, and AI now have increasingly

powerful ideas that we can use to help us understand learning and memory

  • AI has expanded the space of what we can consider to be learned

rather than innate

  • But currently, AI breakthroughs are drastically over-

compartmentalized

  • We can use meta-learning to teach a neural network just about

anything

  • But there’s little generalization outside of a limited meta-task

space

  • And there’s very little fully integrative work going on, allowing a

single integrated learner to acquire a range of skills all of which can be brought together to solve the problem of general artificial intelligence