Prefrontal cortex as a Meta-reinforcement learning system Matthew - - PowerPoint PPT Presentation

▶

Aug 30, 2022 45 likes •542 views

Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL Mnih et al, Nature (2015) Mnih et al, Nature (2015) Yamins & DiCarlo, 2016 Schultz et al, Science

SLIDE 1

Prefrontal cortex as a Meta-reinforcement learning system

Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL

SLIDE 2

Mnih et al, Nature (2015)

SLIDE 3

SLIDE 4

Mnih et al, Nature (2015)

SLIDE 5

Yamins & DiCarlo, 2016

SLIDE 6

Schultz et al, Science (1997)

SLIDE 7

Jederberg et al., 2016

SLIDE 8

Jederberg et al., 2016

SLIDE 9

Mante et al., Nature, 2013 Song et al., Elife, 2017

SLIDE 10

SLIDE 11

Lake et al, BBS (2017)

SLIDE 12

SLIDE 13

Harlow, Psychological Review, 1949

“Learning to learn”

SLIDE 14

Harlow, Psychological Review, 1949

Training episodes

“Learning to learn”

SLIDE 15

Mnih et al, Nature (2015)

SLIDE 16

SLIDE 17

Jederberg et al., 2016

SLIDE 18

Jederberg et al., 2016

SLIDE 19

https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/

SLIDE 20

SLIDE 21

a

t

v

t

a

t

r

t

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci., 2016; Duan et al., arXiv (2016)

SLIDE 22

0.7 0.4 0.6 0.9 0.3 0.1 0.8 0.7

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

SLIDE 23

a

t

v

t

a

t

r

t

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

SLIDE 24

Trial

100 80 60 40 1 20 1 2 3 4

Cumulative regret

Gittins indices UCB Thompson sampling

Trial Episode

Left Right

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

SLIDE 25

at vt

t at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

SLIDE 26

0.7 0.3 0.6 0.4 0.3 0.7 0.8 0.2

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

SLIDE 27

Trial

100 80 60 40 1 20 1 2 3 4

Cumulative regret

Gittins indices UCB Thompson sampling

Trial Episode

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

SLIDE 28

Training episodes

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

SLIDE 29

a

t

v

t

a

t

r

t

δ

(PFC)

(DA)

Volkmann et al., Nature Reviews Neurology, 2010

SLIDE 30

4 2

2 4

log2

RR RL

log2

CR CL

4 2

log2

RR RL

log2

CR CL

2 4 Tsutsui et al., Nature Comms, 2016

Wang et al., Nature Neuroscience (2018)

SLIDE 31

a

t

v

t

a

t

r

t

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018)

SLIDE 32

at-1 rt-1 at-1x rt-1 vt

0.2 0.1 0.3 0.4 0.5 0.6

Proportion

Tsutsui et al., Nature Comms, 2016 0.2 0.1 0.3 0.4 0.5 0.6

Correlation

at-1 rt-1 at-1x rt-1 vt Wang et al., Nature Neuroscience (2018)

SLIDE 33

at vt

t at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018)

SLIDE 34

SLIDE 35

Trial

100 80 60 40 1 20 1 2 3 4

Cumulative regret

Gittins indices UCB Thompson sampling

Trial Episode

SLIDE 36

A B

20 40 60 80 100 120 140 160 180 200

Step

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 20 40 60 80 100 120 140 160 180 200

Step

Reward probability Inferred/decoded volatility Learning rate action feedback

Behrens et al., Nature Neuroscience, 2007 Wang et al., Nature Neuroscience (2018)

SLIDE 37

Behrens et al., Nature Neuroscience, 2007 Wang et al., Nature Neuroscience (2018)

SLIDE 38

a

t

v

t

a

t

r

t

δ

(PFC)

(DA)

Volkmann et al., Nature Reviews Neurology, 2010

SLIDE 39

Bromberg-Martin et al, J Neurophys, 2010

REVERSAL

Wang et al., Nature Neuroscience (2018)

SLIDE 40

a

t

v

t

a

t

r

t

δ

(PFC)

(DA)

Left rewarded Right rewarded

Wang et al., Nature Neuroscience (2018)

SLIDE 41

SLIDE 42

Miller, Botvinick & Brody, Nat. Neuro., 2017; Daw et al., Neuron, 2011

Model-based RPE

Stage 2 1

Meta-RL RPE

Reward r2 = 0.89

Model-based RL (from model-free RL)

Wang et al., Nature Neuroscience (2018)

SLIDE 43

DA blocked upon food reward from large/risky option DA blocked upon food reward from small/certain option DA triggered upon food omission from large/risky option

Wang et al., arXiv; 2018 Stopper et al., Neuron, 2014

Optogenetic manipulation of dopamine

SLIDE 44

Mnih et al, Nature (2015)

SLIDE 45

SLIDE 46

SLIDE 47

Richer environments / abstractions (Espeholt et al., arXiv, 2018)
Architectural biases (e.g., Raposo et al., NIPS, 2017)
Complementary forms of meta-learning (e.g., Fernando et al., under review)
Episodic reinstatement (Ritter et al., in press)

Current / Future Work

SLIDE 48

Neuroscience and AI: A virtuous circle

SLIDE 49