Learning and Meta-learning computation making predictions choosing - PDF document

Learning and Meta-learning • computation – making predictions – choosing actions – acquiring episodes – statistics • algorithm – gradient ascent ( eg of the likelihood) – correlation – Kalman filtering • implementation – Hebbian synpatic plasticity – neuromodulation 1

Types of Learning supervised v | u inputs u and desired or target outputs v both provided, eg prediction → outcome reinforce max r | u input u and scalar evaluation r often with temporal credit assignment problem unsupervised or self-supervised u learn structure from statistics These are closely related: supervised learn P [ v | u ] unsupervised learn P [ v , u ] 2

Hebb Famously suggested: if cell A consistently contributes to the activity of cell B, then the synapse from A to B should be strengthened • strong element of causality • what about weakening (LTD)? • multiple timescales – STP to protein synthesis • multiple biochemical mechanisms • systems: – hippocampus – multiple sub-areas – neocortex – layer and area differences – cerebellum – LTD is the norm 3

Neural Rules field potential amplitude ( mV ) 0.4 LTP LTD 0.3 potentiated level depressed, partially depotentiated level 0.2 control level 0.1 1 s 10 min 100 2 Hz Hz 0 40 0 10 20 30 time (min) 4

Stability and Competition Hebbian learning involves positive feedback. Control by: LTD usually not enough – covariance versus correlation saturation prevent synaptic weights from getting too big (or too small) – triviality beckons competition spike-time dependent learning rules normalization over pre-synaptic or post-synaptic arbors • subtractive: decrease all synapses by the same amount whether large or small • multiplicative: decrease large synapses by more than small synapses 5

Preamble Linear firing rate model N u dv � dt = − v + w · u = − v + τ r w b u b b =1 assume that τ r is small compared with the rate of change of the weights, then v = w · u during plasticity Then have d w τ w dt = f ( v, u , w ) Supervised rules use targets to specify v – neural basis in ACh? 6

The Basic Hebb Rule d w τ w dt = u v averaged �� over input statistics gives d w dt = � u v � = � uu · w � = Q · w τ w where Q is the input correlation matrix. Positive feedback instability dt | w | 2 = 2 τ w w · d w d dt = 2 v 2 τ w Also have discretised version w → w + T Q · w . τ w integrating over time, presenting patterns for T seconds. 7

Covariance Rule Since LTD really exists, contra Hebb: d w τ w dt = ( v − θ v ) u or d w τ w dt = ( u − θ θ θ u ) v If θ v = � v � or θ θ θ u = � u � then d w dt = C · w τ w where C = � ( u − � u � )( u − � u � ) � is the input covariance matrix. Still unstable d dt | w | 2 = 2 v ( v − � v � ) τ w which averages to the (positive) covariance of v . 8

BCM Rule 0 Odd to have LTD with v = 0 or u = 0 0. Evidence for d w τ w dt = v u ( v − θ v ) . 1.5 weight change/u 1 0.5 0 −0.5 −1 0 0.5 1 1.5 v If θ v slides to match a high power of v dθ v dt = v 2 − θ v τ θ with a fast τ θ , then get competition between synapses – intrinsic stabilization. 9

Subtractive Normalisation Could normalise | w | 2 or � w b = n · w n = (1 , 1 . . . , 1) For subtractive normalisation of n · w : dt = v u − v ( n · u ) d w τ w n N u with dynamic subtraction, since d n · w 1 − n · n � � = v n · u τ w = 0 . dt N u as n · n = N u . Strongly competitive – typically all the weights bar one go to 0. Therefore use upper saturating limit. 10

The Oja Rule A multiplicative way to ensure | w | 2 is constant d w dt = v u − αv 2 w τ w gives d | w | 2 = 2 v 2 (1 − α | w | 2 ) . τ w dt so | w | 2 → 1 /α . Dynamic normalisation – could also enforce normalisation all the time. 11

Timing-Based Rules A B 140 90 epsp amplitude (% of control) 130 percent potentiation 60 (+10 ms ) 120 110 30 ( ± 100 ms ) 100 0 90 -30 80 (-10 ms ) 70 -60 0 50 25 -100 -50 0 50 100 time (min) t post - t pre (ms) slice cortical pyramidal cells; Xenopus retinotectal system • window of 50ms • gets Hebbian causality right • rate-description � ∞ d w dτ ( H ( τ ) v ( t ) u ( t − τ ) + H ( − τ ) v ( t − τ ) u ( t )) . τ w dt = 0 • spike-based description necessary if an input spike can have a measurable impact on an output spike. • critical factor is the overall integral – net LTD with ‘local’ LTP. • partially self-stabilizing 12

Timing-Based Rules Gutig et al; van Rossum et al: � − λf − ( w i ) K (∆ t ) if ∆ t ≤ 0 ∆ w i = λf + ( w i ) K (∆ t ) if ∆ t > 0 K (∆ t ) = e −| ∆ t | /τ f + ( w ) = (1 − w ) µ f − ( w ) = αw µ 13

FP Analysis How can we predict the weight distribution? 1 ∂P ( w, t ) = − p p P ( w, t ) − p d P ( w, t )+ ρ in ∂t p p P ( w − w p , t ) + p d P ( w + w d , t ) Taylor-expand about P ( w, t ) leads to a Fokker-Planck equation. Need to work out p d and p p ; assume steady firing Depression: p d = t window /t isi � t w Potentiation: I affects O: p p = 0 P ( δt ) dδt 14

Single Postsynaptic Neuron Basic Hebb rule: d w dt = Q · w τ w analyse using an eigendecomposition of Q : Q · e µ = λ µ e µ λ 1 ≥ λ 2 . . . Since Q is symmetric and positive (semi-)definite • complete set of real orthonormal evecs • with non-negative eigenvalues • whose growth is decoupled Write N u � w ( t ) = c µ ( t ) e µ µ =1 then t � � c µ ( t ) = c µ (0) exp λ µ τ w and w ( t ) → α ( t ) e 1 as t → ∞ 15

Constraints α ( t ) = exp( λ µ t/τ w ) → ∞ . • Oja makes w ( t ) → e 1 / √ α • saturation can disturb outcome A B 1 1 0.8 0.8 0.6 0.6 w 2 w 2 0.4 0.4 0.2 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 w 1 w 1 • subtractive constraint w = Q · w − ( w · Q · n ) n τ w ˙ . N u Sometimes e 1 ∝ n – so its growth is stunted; and e µ · n = 0 for µ � = 1 so w ( t ) = ( w (0) · e 1 ) e 1 + N u � λ µ t � � exp ( w (0) · e µ ) e µ τ w µ =2 16

Translation Invariance Particularly important case for development has Q bb ′ = Q ( b − b ′ ) � u b � = � u � Write n = (1 , . . . , 1) and J = nn T , then Q ′ = Q − N � u � 2 J 1. e µ · n = 0, AC modes are unaffected 2. e µ · n � = 0, DC modes are affected 3. Q has discrete sines and cosines as eigenvectors 4. fourier spectrum of Q are the eigenvalues 17

PCA What is the significance of e 1 ? A B C u 2, w 2 2 4 4 u 2, w 2 3 u 2, w 2 3 u 1, w 1 2 2 -2 2 1 1 -2 0 0 0 1 2 3 4 0 1 2 3 4 u 1, w 1 u 1, w 1 • optimal linear reconstruction: minimise � | u − g v | 2 � E ( w , g ) = • information maximisation: I [ v, u ] = H [ v ] − H [ v | x ] under a linear model • assume � u � = 0 0 0 or use C instead of Q . 18

Linear Reconstruction � | u − g v | 2 � E ( w , g ) = K − 2 w · Q · g + � g � 2 w · Q · w = quadratic in w with minimum at g w ∗ = � g � 2 making E ( w ∗ , g ) = K − g · Q · g . � g � 2 k ( e k · g ) e k and � g � 2 =1: look for soln with g = � N ( e k · g ) 2 λ k E ( w ∗ , g ) = K − � k =1 clearly has e 1 · g = 1 and e 2 · g = e 3 · g = . . . = 0 0 0 Therefore g and w both point along principal component 19

Infomax (Linsker) argmax w I [ v, u ] = H [ v ] − H [ v | u ] Very general unsupervised learning suggestion: • H [ v | u ] is not quite well defined unless v = w · u + η where η is arbitrarily deterministic 2 log 2 πeσ 2 for a Gaussian. • H [ v ] = 1 If P [ u ] ∼ N [0 0 0 , Q ] then v ∼ N [0 , w · Q · w + υ 2 ] maximise wQw T subject to � w � 2 = 1 Same problem as above: implies that w ∝ e 1 . note the normalisation If non-Gaussian, only maximising an upper bound on I [ v, u ]. 20

v ( a ) W ( a; b ) A ( a; b ) W ( a; b ) A ( a; b ) Ocular Dominance u ( b ) u ( b ) cortex competitive interaction L R L R left thalamus right • retina-thalamus-cortex • OD develops around eye-opening • interaction with refinement of topography � A W W • interaction with orientation a • interaction with ipsi/contra-innervation • effect of manipulations to input b b b b b ocularity L R L R L R 21

Start Simple Consider one input from each eye v = w R u R + w L u L . Then � � q S q D Q = � uu � = q D q S has √ e 1 = (1 , 1) / 2 λ 1 = q S + q D √ e 2 = (1 , − 1) / 2 λ 2 = q S − q D so if w + = w R + w L , w − = w R − w L then dw + dw − τ w = ( q S + q D) w + τ w = ( q S − q D) w − . dt dt Since q D ≥ 0, w + dominates – so use subtractive normalisation dw + dw − = 0 = ( q S − q D) w − . τ w τ w dt dt so w − → ± ω and one eye dominates. 22

Orientation Selectivity Model is exactly the same – input correlations come from ON/OFF cells: C) D) 1 1.5 0.5 1 Q (b) ~ − Q (b) ~ 0 0.5 −0.5 −1 0 −6 −4 −2 0 2 4 6 0 2 4 6 ~ b b Now dominant mode of Q − has spatial structure: centre-surround version also possible, but is usually dominated because of non-linear effects. 23

Learning and Meta-learning computation making predictions choosing - PDF document

Learning and Meta-learning computation making predictions choosing actions acquiring episodes statistics algorithm gradient ascent ( eg of the likelihood) correlation Kalman filtering implementation

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

Simultaneous meta and data manipulation in Blaise Marien Lina Statistics netherlands Statistics

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

META-SHARE META SHARE the Open Resource Exchange Facility Stelios Piperidis ILSP-Athena RC,

CS 671 Automated Reasoning Meta Reasoning Object Level versus Meta Level Object level:

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng Ou Outline

Meta Reinforcement Learning Kate Rakelly 11/13/19 Questions we seek to answer Motivation : What

WHOLESALE DAY WHOLESALE DAY DIGITAL SINGLE MARKET DATA DIGITAL SKILLS EUROCOMMERCE.EU

I have no disclosures Sa ra Whe tsto ne , MD, MHS Unive rsity o f Ca lifo rnia , Sa n F ra nc

Recap : UML artefacts Actors Use Cases Use Case Diagrams Storyboards Black Box Functional

What is Item Response Theory? Nick Shryane Social Statistics Discipline Area University of

Key trends Healthcare focussed funds Where people are investing Asset lite vs Asset

2016 ESC Guidelines for the Diagnosis and treatment of on Acute & Chronic Heart Failure

A b o u t t h e D e p a r t me n t o f M e n t a l H e a l t h ( D

...one-page, ...one-page, intuitive, The inspiration The inspiration Our approach Our approach

Learning and Meta-learning computation making predictions choosing - PDF document

Learning and Meta-learning computation making predictions choosing actions acquiring episodes statistics algorithm gradient ascent ( eg of the likelihood) correlation Kalman filtering implementation

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

Simultaneous meta and data manipulation in Blaise Marien Lina Statistics netherlands Statistics

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers &amp; References META-LRA Kft. 8400 Ajka,

META-SHARE META SHARE the Open Resource Exchange Facility Stelios Piperidis ILSP-Athena RC,

CS 671 Automated Reasoning Meta Reasoning Object Level versus Meta Level Object level:

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem &amp; Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng Ou Outline

Meta Reinforcement Learning Kate Rakelly 11/13/19 Questions we seek to answer Motivation : What

WHOLESALE DAY WHOLESALE DAY DIGITAL SINGLE MARKET DATA DIGITAL SKILLS EUROCOMMERCE.EU

I have no disclosures Sa ra Whe tsto ne , MD, MHS Unive rsity o f Ca lifo rnia , Sa n F ra nc

Recap : UML artefacts Actors Use Cases Use Case Diagrams Storyboards Black Box Functional

What is Item Response Theory? Nick Shryane Social Statistics Discipline Area University of

Key trends Healthcare focussed funds Where people are investing Asset lite vs Asset

2016 ESC Guidelines for the Diagnosis and treatment of on Acute &amp; Chronic Heart Failure

A b o u t t h e D e p a r t me n t o f M e n t a l H e a l t h ( D

...one-page, ...one-page, intuitive, The inspiration The inspiration Our approach Our approach

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

2016 ESC Guidelines for the Diagnosis and treatment of on Acute & Chronic Heart Failure