Machine Learning and Brain Science Kenji Doya doya@oist.jp Neural - PowerPoint PPT Presentation

RIKEN–Osaka–OIST Joint Workshop 2016 Big Waves of Theoretical Science in Okinawa 2016.6.21 Machine Learning and Brain Science Kenji Doya doya@oist.jp Neural Computation Unit Okinawa Institute of Science and Technology

Okinawa Institute of Science & Technology ��.��0�0��.2��7��.��7 ��7�-‐‑–�0��0�0��07��1��7�� 7�0�7��7��7��7�0��.��7�� ¡�� 7�1��0�� ¡�� .�

Our Research Interests How to build adaptive, How the brain realizes autonomous systems robust, flexible adaptation robot experiments neurobiology

Outline Machine Learning and Brain Science Reinforcement Learning and Basal Ganglia Delayed Reward and Serotonin What’s Next

Machine Learning and Brain Science To make intelligent machines by electronics, we should not bother biological constraints. As there’s a superb implementation in the brain, we should learn from that. Currently, brain-like implementation like Deep Learning gives the best performance.

Coevolution in Pattern Recognition RECEPTIVE FIELDS IN CAT STRIATE CORTEX 579 Brain Science found by changing the size, shape and orientation of the stimulus until a clear Artificial Intelligence response was evoked. Often when a region with excitatory or inhibitory responses was established the neighbouring opposing areas in the receptive field could only be demonstrated indirectly. Such an indirect method is illustrated in Fig. 3B, where two flanking areas are indicated by using a short slit in various positions like the hand of a clock, always including the very A B +7 - ! Perceptron Feature detectors mm - I (Rosenblatt 1962) (Hubel & Wiesel 1959) - m_ Multi-layer learning aS~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~T T T Experience dependence 87 HIPPOCAMPAL PLACE UNITS Fig. 3. Same unit as in Fig. 2. A, responses to shinling a rectangular light spot, 1° x 8°; centre of slit superimposed on centre of receptive field; successive stimuli rotated clockwise, as shown (Amari, 1967) to left of figure. B, responses to a 1° x 5° slit oriented in various directions, with one end WALL always covering the centre of the receptive field: note that this central region evoked responses when stimulated alone (Fig. 2 a). Stimulus and background intensities as in Fig. 1; stimulus (Blakemore & Cooper 1970) duration 1 sec. centre of the field. The findings thus agree qualitatively with those obtained Neocognitron with a small spot (Fig. 2 a). Receptive fields having a central area and opposing flanks represented a common pattern, but several variations were seen. Some fields had long narrow central regions with extensive flanking areas (Figs. 1-3): others had a large -213-4-j l central area and concentrated slit-shaped flanks (Figs. 6, 9, 10). In many (Fukushima 1980) fields the two flanking regions were asymmetrical, differing in size and shape; Place cell in these a given spot gave unequal responses in symmetrically corresponding 37 PHYSIO. CXL,VIIT (O’Keefe 1976) RACK ConvNet (Krizhevsky, Sutskever, Hinton, 2012) Face cell (Bruce, Desimone, Gross 1981) GoogleBrain (2012) 2. Place fields for all place units except 21342 and those from animal 217. FIG. distributed around the maze. The concentration of fields from the other animals in arm B may have reflected the fact that many of the rats spent their “free time” in this arm. The fact that the initial search for units 3 was conducted there might also have introduced a bias towards units active in that area. In any case, it was clear that the majority of fields were not located in those places which contained the rewards or other (Sugase et al. 1999) FIG. 3. Place fields for place units from animal 217.

What is Machine Learning Supervised Learning 1 M = 3 t Input-output pairs {(x 1 ,y 1 ), (x 2 ,y 2 ),…} 0 → input-output model y = f(x) + ! − 1 for new input x, predict output y 0 1 x Reinforcement Learning state-action-reward triplets {(x 1 ,y 1 ,r 1 ), (x 2 ,y 2 ,r 2 ),…} → action policy y = f(x) to maximize reward Unsupervised Learning 100 Input data { x 1 , x 2 , x 3 ,…} 80 → statistical model of P(x) 60 discover structure behind data 40 1 2 3 4 5 6

Specialization by Learning Algorithms (Doya, 1999) Cerebral Cortex ： Unsupervised Learning output input Cortex Basal Ganglia: Reinforcement Learning reward Basal thalamus output input Ganglia SN Cerebellum Cerebellum: Supervised Learning target IO + error - output input

Learning by Trial and Error (Doya & Nakano, 1985) Explore actions (cycle of 4 postures) Learn from performance feedback (speed sensor)

Reinforcement Learning reward r action a agent environment state s Learn action policy: s " a to maximize rewards Value function: expected future rewards V(s(t)) = E[ r(t) + # r(t+1) + # 2 r(t+2) + # 3 r(t+3) +…] 0 ≤ # ≤ 1: discount factor # V(s(t+1)) Temporal difference (TD) error: $ (t) = r(t) + # V(s(t+1)) – V(s(t))

Pendulum Swing-Up reward function: potential energy value function V(s) s =(angle,angular velocity)

Reinforcement Learning (Morimoto & Doya, 2000) Learning from reward and punishment reward: height of the head punishment: bump on the floor

Learning to Survive and Reproduce (Elfwing et al., 2011, 2014) Catch battery packs Copy ‘genes’ by IR ports survival reproduction, evolution

Reinforcement Learning Predict reward: value function V(s) = E[ r(t) + # r(t+1) + # 2 r(t+2)…| s(t)=s] Q(s,a) = E[ r(t) + # r(t+1) + # 2 r(t+2)…| s(t)=s, a(t)=a] Select action How to implement these steps? greedy : a = argmax Q(s,a) Boltzmann : P(a|s) + exp[ * Q(s,a)] Update prediction: TD error $ (t) = r(t) + # V(s(t+1)) – V(s(t)) How to tune these parameters? ' V(s(t)) = ( $ (t) ' Q(s(t),a(t)) = ( $ (t)

Basal Ganglia Locus of Parkinson’s and Huntington’s diseases Striatum Globus Pallidus Substantia Nigra Thalamus What is their normal function??

Dopamine-dependent Plasticity Medium spiny neurons in striatum glutamate from cortex dopamine from midbrain Three-factor learning rule (Wickens et al.) cortical input + spike " LTD cortical input + spike + dopamine " LTP input x output x reward Time window of plasticity (Yagishita et al., 2014)

Basal Ganglia for Reinforcement Learning? (Doya 2000, 2007) state action Cerebral cortex state/action coding Striatum reward prediction Q(s,a) V(s) Pallidum action selection $ Dopamine neurons TD signal Thalamus

Gambling Rats (Ito & Doya, 2015) 0.5,1s 1,2s Left poking Right Center Left Center Right Cue'tone Rwd'tone Pellet No,rwd Cue$tone Reward$prob.$(L,$R) Left$tone Fixed (900Hz) (50%,0%) Right$tone Fixed (6500Hz) (0%,$50%) pellet dish Varied FreeAchoice$tone (90%,$50%) (White$noise) (50%,$90%) (50%,$10%) (10%,$50%)

Neural Activity in the Striatum (Ito & Doya, 2015) Dorsolateral C R Dorsomedial C R Ventral

State/Action/Reward Coding State Action Reward cue L/R cue L/R cue L/R 0.19 0.81 0.57 6 6 6 phase$1 2 3 4 5 7 1 2 3 4 5 7 3 4 5 7 bits/sec bits/sec bits/sec DLS DMS VS sec sec sec

Generalized Q-learning Model (Ito & Doya, 2009) Action selection P(a(t)=L) = expQ L (t)/(expQ L (t)+expQ R (t)) Action value update: i � {L,R} Q i (t+1) = (1- ( 1 )Q i (t) + ( 1 , 1 if a(t)=i, r(t)=1 (1- ( 1 )Q i (t) - ( 1 , 2 if a(t)=i, r(t)=0 (1- ( 2 )Q i (t) if a(t) ≠ i, r(t)=1 (1- ( 2 )Q i (t) if a(t) ≠ i, r(t)=0 Parameters ( 1 : learning rate ( 2 : forgetting rate , 1 : reward reinforcement , 2 : no-reward aversion

Model Fitting by Particle Filter (90 50) (50 90) (50 10) Left, reward � Left, no-reward � Right, no-reward � Right, reward � Q L � Q R � � �� ( 1 �� ( 2 �� Trials � ��

Model Fitting 1st$Markov$model(4) �� ** 2nd$Markov$model(16) �� ** 3rd$Markov$model(64) �� * 4th$Markov$model(256) �� ** Generalized Q learning standard$Q$(const)(2) ** �� ( 1 : learning FAQ$(const)(3) ** �� ( 2 : forgetting DFAQ$(const)(4) �� ** local$matching$law(1) �� ** , 1 : reinforcement , 2 : aversion standard$Q$(variable)(2) �� ** FAQ$(variable)(2) �� standard: ( 2 = , 2 =0 DFAQ$(variable)(2) �� forgetting: , 2 =0 �� normalized$ likelihood

Action/State Values in Striatum (Ito & Doya, 2015) cue L/R Action value phase$1 2 3 4 5 6 7 DLS higher$QL firing lower$QL (Hz) DMS Action Reward QL QL QR trials State value firing (Hz) higher$QL VS lower$QL QL higher$QR lower$QR Action Reward QR QL QR trials

Machine Learning and Brain Science Kenji Doya doya@oist.jp Neural - PowerPoint PPT Presentation

RIKENOsakaOIST Joint Workshop 2016 Big Waves of Theoretical Science in Okinawa 2016.6.21 Machine Learning and Brain Science Kenji Doya doya@oist.jp Neural Computation Unit Okinawa Institute of Science and Technology Okinawa Institute

BRAIN VENTRICULAR SYSTEM CSF THE BRAIN BRAIN The brain (encephalon) lies within the cranium. It

Pitch Anything by Oren Klaff BUYER 3 3 Neocortex Neocortex 2 2 Mid Brain Mid Brain

Nilearn: Machine learning for brain imaging in Python Ga el Varoquaux INRIA/Parietal 1

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Machine Learning for Systems and Systems for Machine Learning Jeff Dean Google Brain team

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

BETTER THAN PROZAC: TRANSLATING THE NEW BRAIN SCIENCE INTO GREATER CLINICAL RESULTS Bill

Language and the human brain Brain and Language What will be covered? A brief survey of

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Python for brain mining: (neuro)science with state of the art machine learning and data

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

A Heart (The Nerve!) Regions of the Brain Cerebral hemisphere Diencephalon Cerebellum Brain

Brain Injury Brain Injury: Definition: Brain injury refers to damage or destruction of A

Modularity, synchronization, and noise: a view from nonlinear contraction theory Quang-Cuong Pham

Charting the Future: What PDCD Teaches Us About Mitochondrial Disease Rebecca Ganetzky, MD Why

Developmental Developmental Disorders affecting Disorders affecting language language

Motor Disorders Howard Poizner Institute for Neural Computation & Cognitive Science

A Bootstrap Method to Improve Brain Subcortical Network Segregation in Resting-State FMRI Data C.

Bifurcation Analysis of a Model of Parkinsonian STN - GPe Activity Flix NJAP Postdoctoral

Brain and Reinforcement Learning Kenji Doya doya@oist.jp Neural Computation Unit Okinawa

Topics in Brain Computer Interfaces Topics in Brain Computer Interfaces CS295- -7 7 CS295

Sambuz

Useful Links

Newsletter

Mail Us

Machine Learning and Brain Science Kenji Doya doya@oist.jp Neural - PowerPoint PPT Presentation

RIKENOsakaOIST Joint Workshop 2016 Big Waves of Theoretical Science in Okinawa 2016.6.21 Machine Learning and Brain Science Kenji Doya doya@oist.jp Neural Computation Unit Okinawa Institute of Science and Technology Okinawa Institute

BRAIN VENTRICULAR SYSTEM CSF THE BRAIN BRAIN The brain (encephalon) lies within the cranium. It

Pitch Anything by Oren Klaff BUYER 3 3 Neocortex Neocortex 2 2 Mid Brain Mid Brain

Nilearn: Machine learning for brain imaging in Python Ga el Varoquaux INRIA/Parietal 1

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Machine Learning for Systems and Systems for Machine Learning Jeff Dean Google Brain team

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

BETTER THAN PROZAC: TRANSLATING THE NEW BRAIN SCIENCE INTO GREATER CLINICAL RESULTS Bill

Language and the human brain Brain and Language What will be covered? A brief survey of

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Python for brain mining: (neuro)science with state of the art machine learning and data

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

A Heart (The Nerve!) Regions of the Brain Cerebral hemisphere Diencephalon Cerebellum Brain

Brain Injury Brain Injury: Definition: Brain injury refers to damage or destruction of A

Modularity, synchronization, and noise: a view from nonlinear contraction theory Quang-Cuong Pham

Charting the Future: What PDCD Teaches Us About Mitochondrial Disease Rebecca Ganetzky, MD Why

Developmental Developmental Disorders affecting Disorders affecting language language

Motor Disorders Howard Poizner Institute for Neural Computation &amp; Cognitive Science

A Bootstrap Method to Improve Brain Subcortical Network Segregation in Resting-State FMRI Data C.

Bifurcation Analysis of a Model of Parkinsonian STN - GPe Activity Flix NJAP Postdoctoral

Brain and Reinforcement Learning Kenji Doya doya@oist.jp Neural Computation Unit Okinawa

Topics in Brain Computer Interfaces Topics in Brain Computer Interfaces CS295- -7 7 CS295

Sambuz

Useful Links

Newsletter

Mail Us

Motor Disorders Howard Poizner Institute for Neural Computation & Cognitive Science