Michael J. Frank Laboratory for Neural Computation and Cognition - - PowerPoint PPT Presentation

michael j frank laboratory for neural computation and
SMART_READER_LITE
LIVE PREVIEW

Michael J. Frank Laboratory for Neural Computation and Cognition - - PowerPoint PPT Presentation

Clustering and generalization of abstract structures in reinforcement learning Michael J. Frank Laboratory for Neural Computation and Cognition Brown University Reinforcement learning in neural nets and AI Mnih et al, 2015 , Nature But nets


slide-1
SLIDE 1

Clustering and generalization of abstract structures in reinforcement learning 


Michael J. Frank

Laboratory for Neural Computation and Cognition Brown University

slide-2
SLIDE 2

Reinforcement learning in neural nets and AI

Mnih et al, 2015, Nature

slide-3
SLIDE 3

Kansky et al, 2017 See also Witty et al 2018 arXiv Offset Paddle Breakout Breakout trained on Asynchronous Advantage Actor-Critic (A3C)

  • 3

But nets show failure to transfer learned knowledge

slide-4
SLIDE 4
  • Limited WM capacity (curse of dimensionality)
  • Multi-tasking (shared representations enhance

learning & generalization)

  • Robustness to task contingencies (OpAL vs RL)
  • (catastrophic) interference in episodic memory
  • Unsupervised (hebbian) vs supervised
  • Motor learning – hierarchical structure
  • In defense of “small” problems:
  • need to understand key elements
  • link to neural data / experiments
  • Theory of Everything before Theory of Anything!
  • 4

What can we learn from limitations of models and humans? Trade-offs

slide-5
SLIDE 5

Why does motor learning develop so slowly in humans?

  • Standard story: infants born early due to large head, small birth canal
  • ‘Fourth trimester’
slide-6
SLIDE 6

Why does motor learning develop so slowly in humans?

  • Standard story: infants born early due to large head, small birth canal
  • ‘Fourth trimester’
  • But 3 month old infants are still pretty incompetent (from babycenter.com):
slide-7
SLIDE 7

Why does motor learning develop so slowly in humans?

  • Standard story: infants born early due to large head, small birth canal
  • ‘Fourth trimester’
  • But 3 month old infants are still pretty incompetent (from babycenter.com):

‘you no longer need to support his head. When he’s on his stomach he can lift his head and chest. He can open and close his hands..’

slide-8
SLIDE 8

Why does motor learning develop so slowly in humans?

  • Standard story: infants born early due to large head, small birth canal
  • ‘Fourth trimester’
  • But 3 month old infants are still pretty incompetent (from babycenter.com):

‘you no longer need to support his head. When he’s on his stomach he can lift his head and chest. He can open and close his hands..’

  • Hypothesis: human brain is wired to discover latent generalizable structure,

which is initially inefficient – see Werchan et al 2016!

slide-9
SLIDE 9

Why does motor learning develop so slowly in humans?

  • Standard story: infants born early due to large head, small birth canal
  • ‘Fourth trimester’
  • But 3 month old infants are still pretty incompetent (from babycenter.com):

‘you no longer need to support his head. When he’s on his stomach he can lift his head and chest. He can open and close his hands..’

  • Hypothesis: human brain is wired to discover latent generalizable structure,

which is initially inefficient

slide-10
SLIDE 10

Why does motor learning develop so slowly in humans?

  • Standard story: infants born early due to large head, small birth canal
  • ‘Fourth trimester’
  • But 3 month old infants are still pretty incompetent (from babycenter.com):

‘you no longer need to support his head. When he’s on his stomach he can lift his head and chest. He can open and close his hands..’

  • Hypothesis: human brain is wired to discover latent generalizable structure,

which is initially inefficient

slide-11
SLIDE 11

Driving rules UK Driving rules Montreal and…

Humans learn contextualized
 rule structures

slide-12
SLIDE 12

A key structure: Task-sets (TS)

stimuli actions Cue 1

slide-13
SLIDE 13

Task-sets (TS)

S1 S2 S3 A1 A2 A3 C1

slide-14
SLIDE 14

Task-sets (TS)

Si1 Ai1 C1 Si2 Ai2 C2 Si3 Ai3 C3 Si4 Ai4 C4 Si5 Ai5 C5 Si6 Ai6 C6

slide-15
SLIDE 15

Abstracting Task-set rules

C1 Si1 Ai1 C2 C3 C4 Si2 Ai2 C5 C6

TS1 TS2

Latent task-set space

Collins & Frank 2013

slide-16
SLIDE 16

Popularity Prior on Task-set rules

C1 Si1 Ai1 C2 C3 C4 Si2 Ai2 C5 C6

TS1 TS2

C7

?

Collins & Frank 2013

C2

CRP Prior on TS in a new context: P0(TS = TSj |Cnew) = N(TSj|C*) / [α + Σi N(TSi| C*)] P0(TS = new|Cnew) = α / [α + Σi N(TSi| C*)]

slide-17
SLIDE 17

Ability to create new Task-set rules

C1 Si1 Ai1 C2 C3 C4 Si2 Ai2 C5 C6

TS1 TS2

Si Ai C7

TSnew

Latent task-set space: Unknown size

Collins & Frank 2013

slide-18
SLIDE 18

Linking algorithmic model and neural network model

Both models are approximations of the same process: TS space building

Collins & Frank, Psych Review, 2013

DA BG TSi BG Ai

CTS-model Neural Network-model

slide-19
SLIDE 19

Clustering vs partitioning task space in frontostriatal circuits via RL

Old TS New TS generalization & transfer

RL

Collins & Frank 2013; 2016; Frank & Badre, 2012

slide-20
SLIDE 20

Clustering vs partitioning task space in frontostriatal circuits via RL

C-PFC sparseness

Fitted clustering

prior Old TS New TS generalization & transfer

RL

Collins & Frank 2013; 2016; Frank & Badre, 2012

Model mimicry: C-TS and hierarchical neural net are approximations of same structure building process

slide-21
SLIDE 21

Vector reward prediction errors:
 “actor-specific” computations

  • DA signals are tailored to computations of underlying FC-BG circuit
  • “Mixture of Experts” (Frank & Badre 2012; fMRI: Badre & Frank 2012; Collins & Frank 2013…)
  • Vector RPEs

“Mixture of Experts”

MIXTURE

Hierarchical task

Flat task

slide-22
SLIDE 22

S1 S2 C0 A1 A2 C1 A1 A2 C2 A3 A4 C3 C4

Initial Phase

Appending to latent task structures: beyond the identity mapping..

slide-23
SLIDE 23

S1 S2 S3 S4 C0 A1 A2 A1 A4 C1 A1 A2 A1 A4 C2 A3 A4 A3 A2 C3 C4

Initial Phase Transfer Phase 1

Appending to latent task structures: extrapolating beyond the identity mapping

slide-24
SLIDE 24

C0 C1 C2 TS1 TS2 A1 A2 A3 A4 ? ? 1/4 1/4 1/2 Trial# per input pattern Proportion Correct C0, C1 C2

*

Subjects (N=34)

Initial phase

Trial# per input pattern Proportion Correct C0, C1 C2

*

Subjects (N=34)

Phase 2

2 4 6 8 0.2 0.4 0.6 0.8 1 H

Proportion Correct C0, C1 C2 Model Trial# per input pattern

slide-25
SLIDE 25

Trial# per input pattern Proportion Correct C0, C1 C2

*

Subjects (N=34)

Initial phase

2 4 6 8 0.2 0.4 0.6 0.8 1 H

Proportion Correct C0, C1 C2 Model

2 4 6 8 0.2 0.4 0.6 0.8 1 init

Proportion Correct C0, C1 C2 Model Trial# per input pattern Proportion Correct C0, C1 C2

*

Subjects (N=34)

Phase 2

slide-26
SLIDE 26

Can subjects generalize learned rules to new contexts?

C0 C1 C2 TS1 TS2 C4 TS3 C3 C3

slide-27
SLIDE 27

S1 S2 S3 S4 C0 TS1 TS1 C1 TS1 TS1 C2 TS2 TS2 C3 TS old C4 TS new

Initial Phase Transfer Phase 2 Transfer Phase 1

Can subjects generalize learned rules to new contexts?

slide-28
SLIDE 28

C0 C1 C2 TS1 TS2 A1 A2 A3 A4 C3 C3 C4 TS3 A1 A4

Subjects (N = 34) Trial# per input pattern Proportion Correct *

2 4 6 8 0.2 0.4 0.6 0.8 1 CV

C3: TS old C4: TS new Model Trial# per input pattern Proportion Correct

slide-29
SLIDE 29

C0 C1 C2 TS1 TS2 A1 A2 A3 A4 ? ? Correct Correct Correct Correct

Prediction error:

PE = reward - expectation PE PE Structure learning PE

slide-30
SLIDE 30

Prediction error (PE) in EEG signal

Collins & Frank (2016), Cognition Time from FB

~ β0 + βPE + βStr trial number For each subject: βPE(electrodes, time) βStr(electrodes, time)

slide-31
SLIDE 31

Prediction error (PE) in EEG signal
 Structure PE in EEG signal

PE effect Time from feedback (ms) average βPE

Collins & Frank (2016) Cognition

EEG(trial) ~ β0 + βPE PE(trial) + βStr StructurePE(trial) ** * ns Unique effect of Structure learning PE ** * ROI1 ROI2 ROI1 ROI2

slide-32
SLIDE 32

Structure PE signal predicts transfer.

2 4 6 8 0.2 0.4 0.6 0.8 1 P(Correct) Iteration # 2 4 6 8 0.2 0.4 0.6 0.8 1 P(Correct) Iteration #

C3-TS old C4-TS new Unique effect of Structure learning PE

TS1 TS2

  • ther

0.1 0.2 0.3 0.4 0.5 0.6 % Choices

New context Prior

TS1 action TS2 action

  • ther

action

Collins & Frank, Cognition, accepted

ROI1+2

slide-33
SLIDE 33

It affords transfer It depends on clustering priors It informs neural representations of reward predictions Structure learning

** * ns

Unique effect of Structure learning PE

** *

PE effect

Trial# per input pattern Proportion Correct * Trial# per input pattern Proportion Correct C0, C1 C2 *

slide-34
SLIDE 34

Neural model & EEG: TS switch effects

slide-35
SLIDE 35

Neural model & EEG: TS switch effects

slide-36
SLIDE 36

S1 S2 S3 S4 C0 TS1 TS1 C1 TS1 TS1 C2 TS2 TS2 C3 TS old C4 TS new

Initial Phase Transfer Phase 2 Transfer Phase 1

Structure learning affords transfer

  • f new information within learned

clusters No early clustering benefit

  • early structure learning is

costly Structure learning affords transfer of known rules to new contexts – with popularity clustering prior Neural signatures of

hierarchical prediction errors predict structure learning/ transfer:

Badre & Frank 2012; Collins et al 2014, 2016

slide-37
SLIDE 37

New Context: Old TS New TS

*

N = 33

Significant whole group positive transfer

Do we build structure a priori?

Werchan et al, 2016, JNeurosci

slide-38
SLIDE 38

Share: Physical Movements (mappings from sounds to notes) Share: Chord progression, rhythm, etc (desired sound/ song)

  • 38
slide-39
SLIDE 39
  • 39

Piccolo Need compositionality: reuse flute mappings to play a song usually played on guitar

slide-40
SLIDE 40

Cluster Rewards Transitions

C

Rewards What do you want to do? Transitions How can you do it?

  • 40

vs vs Clustering goals and transitions jointly will not allow generalization of each independent of the other

slide-41
SLIDE 41

Cluster Rewards Transitions Rewards Transitions

ClusterR ClusterT

C C

  • 41

Joint Clustering Independent Clustering Nick Franklin

slide-42
SLIDE 42

𝑺 𝝔 𝑺 𝝔 𝝔 𝑺 𝝔

Joint Clustering Independent Clustering Policy

contexts clusters functions policies

Policy Policy Policy

C C C Franklin & Frank, 2018 PLOS Computational Biology C C C

slide-43
SLIDE 43

a4 a2 a1 a8 a7 a6 a5 a3

𝑏 = { } 𝐵 = { } 𝜚:𝑏 → 𝐵 Reduced Transition Cardinal Movements Actions

N S E W

Rewards 𝑆:𝑦 → {0, + 1} What to do? Rewards: How to do it? Transition: 𝑦 ∈ { < 𝑦𝑗, 𝑦𝑘 > :𝑗 = 1,…, 6; 𝑘 = 1, …, 6} Model-based RL: What-to-do vs How-to-do-it?

  • 43
slide-44
SLIDE 44

Mapping () Reward ()

C1 C2 C3 C4

Agent Performance

  • Independent clustering: Fewer steps to

solve task domain

  • Independent clustering: better estimates
  • f and (lower KL-Divergence)

Franklin & Frank, 2018 PLOS Computational Biology

slide-45
SLIDE 45
  • Joint clustering: Fewer steps to solve

task domain

  • Joint clustering: better estimates of

and (lower KL-Divergence)

Function Estimates Agent Performance

𝜚(𝑏, 𝐵) 𝑆(𝑦)

CC CC CC CC

  • 45

Transitions

C Franklin & Frank, 2018 PLOS Computational Biology

slide-46
SLIDE 46
  • 46

𝜚(𝑏, 𝐵) 𝑆(𝑦)

C C C C C C C C

Meta Model

  • Meta-agent uses RL processes to learn to

select independent/joint clustering as actor

  • Choose Indep./Joint for a single trial
  • Normative: Better than the worst of Joint/

Independent (i.e. minimax)

  • Why is doing better than the worst a good

idea?

Meta- agent Joint actor Ind. actor 𝑥 1 − 𝑥

Mapping () Reward ()

C1 C2 C3 C4

Franklin & Frank, 2018 PLOS Computational Biology

Generalizing to generalize

slide-47
SLIDE 47
  • New room = new context
  • Different transitions in each room
  • Exploration costs: potentially non-linear
  • Generalize benefit
  • Joint: ~2x faster than flat
  • Indep: ~2.5x faster than joint
  • Advantage grows exponentially with size of grid-world
  • Advantage grows linearly with more rooms

“Diabolical Rooms”

  • 47

2. Theory

Back to Start Back to Start End

slide-48
SLIDE 48
  • 48

C C C C

𝜚(𝑏, 𝐵) 𝑆(𝑦) 𝜚(𝑏, 𝐵) 𝑆(𝑦)

C C C C C C C C

?

slide-49
SLIDE 49

Generalization via Bayesian Inference

Posterior probability context is in a cluster Likelihood

  • f data given a

context is in a cluster Prior probability a context is in a cluster

Cluster C C C C C C Cluster

task specific task general

  • 49

2. Theory

How good of a guess about cluster assignment is the prior? Formally, how good of an estimator is the prior of the generative process, regardless of task specifics? ∝ ×

slide-50
SLIDE 50

Joint vs Independent Clustering

Joint Clustering

  • Consider transitions when generalizing goals conditional
  • More complex
  • Bias/Variance better asymptotically

Independent clustering

  • Generalize goals independent of transitions marginal
  • Less complex
  • Bias/Variance robust to noise

C C C Rewards Transitions Rewards Transitions Transitions Rewards C C C Transitions

Joint Clustering Independent Clustering

slide-51
SLIDE 51

A

  • 51

a1 a4 a2 a8 a7 a6 a5 a3

1 B 1

a1 a4 a2 a8 a7 a6 a5 a3

2 A 2 C

Only consider prior: Don’t need statistics of specific transitions, rewards; replace with arbitrary symbol

slide-52
SLIDE 52

A

  • 52

1 B 1 2 A 2 C

R-Sequence T-Sequence

1 ?

Update CRP prior in each context and ask how well it can guess the next reward? * CRP with independent clustering * CRP with joint clustering

slide-53
SLIDE 53

G Sequence: AAAAAAAAAABBBBBBBBBB T Sequence: 11111222222222211111 G Sequence: AAAAAAAAAABBBBBBBBBB T Sequence: 11111122222222221111 G Sequence: AAAAAAAAAABBBBBBBBBB T Sequence: 11111112222222222111 Hold the structure of each sequence constant Increase mutual information () by progressively switching pairs Independent Clustering better with low mutual information

  • 53

2. Theory

Independent vs Joint (bits)

slide-54
SLIDE 54
  • Previous analysis assumed T fully observable and

noiseless

  • Relax assumption?
  • 54

2. Theory

slide-55
SLIDE 55

G Sequence: AAAAAAAAAAAAAAAAAAAABCD T Sequence: 11111111111111111111234 G Sequence: AAAAAAAAAAAAAAAAAAAABCD T Sequence: 11111111111111111115234 G Sequence: AAAAAAAAAAAAAAAAAAAABCD T Sequence: 11111111111111111156234 Hold mutual information constant T perfectly predicts G (HGT=0) Increase chance of misidentifying T i.e. noise Even w/ T perfectly predicts G, independent clustering is useful when T observations are noisy

  • 55

2. Theory

Independent vs Joint (bits)

slide-56
SLIDE 56

G Sequence: AAAAAAAAAAAAAAAAAAAABCDBCD T Sequence: 11111111111111111111234342 G Sequence: AAAAAAAAAAAAAAAAAAAABCDBCD T Sequence: 11111111111111111115234342 G Sequence: AAAAAAAAAAAAAAAAAAAABCDBCD T Sequence: 11111111111111111156234342 Reduce Larger compositionality benefit Reduce by adding to sequence Hold mutual information constant T perfectly predicts G Increase chance of misidentifying T i.e. noise Even w/ T perfectly predicts G, independent clustering is useful when T observations are noisy

  • 56

2. Theory

𝐼(𝐻 𝑈 ) ≈ 0.2 bits Independent vs Joint (bits)

slide-57
SLIDE 57

1. Normative behavior depends on task-domain 2. Meaningful differences 3. Independent simpler statistical model

a) Joint better asymptotically b) Independent better in noisy and/or independent environments

  • 57

2. Theory

slide-58
SLIDE 58
  • 1. Background
  • 2. Theory
  • 3. Human Behavior
  • 58
slide-59
SLIDE 59

N S E W N S E W N S E W

f s a ; l k j d

B A C

Transitions:

relationship between keypresses and cardinal movements

Goal Values

3. Human Behavior 3. Human Behavior

slide-60
SLIDE 60

N S E W

f s a ; l k j d

B A C

Transitions:

relationship between keypresses and cardinal movements

Goal Values Where do you want to go? How do you get there?

Do people cluster Goals and Transitions together (jointly) ? Do people cluster Goals and Transitions independently?

3. Human Behavior 3. Human Behavior

slide-61
SLIDE 61
  • 61

3. Human Behavior 3. Human Behavior

slide-62
SLIDE 62

A B C Goal Values

B A C

  • 62

3. Human Behavior

f s a ; l k j d

Transitions

3. Human Behavior

slide-63
SLIDE 63

B A C

A B C Goal Values

  • 63

3. Human Behavior

Transitions

f s a ; l k j d

3. Human Behavior

slide-64
SLIDE 64

A B C Goal Values

  • 64

3. Human Behavior

Transitions

f s a ; l k j d

B A C

3. Human Behavior

slide-65
SLIDE 65

B A C

A B C Goal Values

  • 65

3. Human Behavior

Transitions

f s a ; l k j d

3. Human Behavior

slide-66
SLIDE 66

A B C Goal Values

  • 66

3. Human Behavior

Transitions

B A C

f s a ; l k j d

3. Human Behavior

slide-67
SLIDE 67

A B C Goal Values

  • 67

3. Human Behavior

Transitions

f s a ; l k j d

?

3. Human Behavior

slide-68
SLIDE 68

A B C

  • 68

3. Human Behavior Joint Clustering

  • Sensitive to transitions

B A C B A C

Independent Clustering

  • Insensitive to transitions

B A C

Goal Popularity Goal Popularity

3. Human Behavior

1 4 2 3

Test Contexts

slide-69
SLIDE 69
  • 69

Model and Human Behavior: Ambiguous Structure

  • When training structure is ambiguous,

generalization is a combination of indep & joint: Meta-generalization

slide-70
SLIDE 70

Follow-up:

  • 1. Can we encourage independent clustering?
  • 2. Can we encourage joint clustering?
  • 70
slide-71
SLIDE 71
  • 71

Test Contexts

Model and Human Behavior: Independent Structure

slide-72
SLIDE 72

Test Contexts A B C

1 4 2 3

D

Model and Human Behavior: Independent Structure

slide-73
SLIDE 73
  • 73

Model and Human Behavior: Joint Structure

slide-74
SLIDE 74

Following-up:

  • 1. Can we encourage independent clustering? Yes!
  • 2. Can we encourage joint clustering? Yes!
  • 3. Meta Model?
  • 74
slide-75
SLIDE 75
slide-76
SLIDE 76
slide-77
SLIDE 77
slide-78
SLIDE 78
slide-79
SLIDE 79
slide-80
SLIDE 80
slide-81
SLIDE 81
slide-82
SLIDE 82
slide-83
SLIDE 83

Summary

  • Learning involves decision making at multiple levels: choose task structure,

choose action, etc.

  • Hierarchical PFC-BG interactions support structure learning
  • Slower to acquire correct action contingencies: have to learn world structure (what

is C, what is S?) simultaneous with actions

  • But structure learning affords generalization:
  • Expanding a rule in a known context-cluster
  • Transferring a rule to a new context
  • Role in development?
  • Joint & Independent Clustering are dissociable; Normative strategy reflects

task structure

slide-84
SLIDE 84

LNCC Anne Collins (Berkeley)

Thanks!

Nick Franklin Denise Werchan Dima Amso Jim Cavanagh Nick Franklin (Harvard)