Learning frameworks Associative reinforcement learning Given input, - PowerPoint PPT Presentation

Learning frameworks Associative reinforcement learning Given input, learn to produce output (action) that maximizes immediate reward Supervised learning Modified Associative reward-penalty ( A R - P ) Assumes environment specifies correct output (targets) for each input p ( a j = 1) = 1 / (1 + exp( − n j )) � ρ ( a j − n j ) a i if success Unsupervised learning ∆ w ij = λρ ((1 − a j ) − n j ) a i if failure Assumes environment only provides input; learning is based on capturing the statistical – Reinforcement is broadcast within multilayer network structure of that input (efficient coding) Reinforcement learning Assumes environment provides evaluative feedback on actions (how good or bad was the outcome) but not what the correct/best action would have been 1 / 19 3 / 19 Reinforcement learning Adaptive critic Feedback can be intermittent, probabalistic, temporally delayed Optimal/effective actions are not provided to learner; must be discovered Feedback (reinforcement signal) reflects overall consequences of action (and other things) in environment Feedback can be intermittent, probabalistic, temporally delayed, and dependent on things outside learner’s control Tension between exploration and exploitation 2 / 19 4 / 19

Sequential reinforcement learning Execute sequence of actions that maximizes expected discounted sum of future rewards � ∞ � r ( t ) + γ r ( t + 1) + γ 2 r ( t + 2) + · · · � γ k r ( t + k ) � � E = E k =0 Temporal difference (TD) methods – Learn to predict expected discounted reward r ( t + 1) + γ r ( t + 2) + γ 2 r ( t + 3) + · · · a j ( t + 1) = � � E � r ( t ) + γ r ( t + 1) + γ 2 r ( t + 2) + γ 3 r ( t + 3) · · · � a j ( t ) = E = E { r ( t ) } + γ a j ( t + 1) E { r ( t ) } = a j ( t ) − γ a j ( t + 1) ∆ w ij ( t ) = ρ ( r ( t ) − E { r ( t ) } ) a i = ρ ( r ( t ) − ( a j ( t ) − γ a j ( t + 1)) ) a i – Use as internal reinforcement for learning actions 5 / 19 7 / 19 Dopamine and reward prediction (Shultz et al., 1997) Classical conditioning Response of dopaminergic neurons in substantia nigra (subcortical nucleus) 6 / 19 8 / 19

Strengths and limitations of reinforcement learning Strengths No need for explicit behavioral targets Can be applied to networks of binary stochastic units TD learning consistent with some physiological evidence (Schultz) Can use associative reinforcement learning (e.g., A R - P ) to learn actions based on prediction of reinforcement learned by TD Limitations Learning is often very slow (not enough information) Application to large/continuous state spaces requires some mechanism for function approximation (e.g., multilayer back-propagation network; deep reinforcement learning ) Associative and TD learning combined only in very simple domains (but deep learning can also be applied to state representations; e.g., auto-encoder) 9 / 19 11 / 19 Forward models Feedback from the world is in terms of distal error (observable consequences) rather than proximal error (motor commands) Would like compute proximal error from distal error (to improve motor commands to achieve goals) Relationship between motor commands and observable consequences involves processes in the external world (e.g., physics) Learn an internal (forward) model of the world which can be inverted (e.g., back-propagated through) to convert distal error to proximal error Such a model can also provide online outcome prediction to detect errors during execution 10 / 19 12 / 19

Training Forward model: predicted − actual Generate action randomly, predict outcome Use discrepancy between predicted outcome and actual outcome as error signal Inverse (action) model: desired − actual Generate action from “intention” in current context Use discrepancy between generated outcome to actual outcome as error signal Back-propagate error through forward model to derive error derivatives for action representation Back-propagate action error to improve inverse model Forward and inverse models can be trained at the same time 13 / 19 15 / 19 14 / 19 16 / 19

17 / 19 19 / 19 18 / 19

Learning frameworks Associative reinforcement learning Given input, - PowerPoint PPT Presentation

Learning frameworks Associative reinforcement learning Given input, learn to produce output (action) that maximizes immediate reward Supervised learning Modified Associative reward-penalty ( A R - P ) Assumes environment specifies correct output

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Associative arrays Associative arrays map a key to a value Keys and values can be different

In-Place Associative Computing Avidan Akerib Ph.D. Vice President Associative Computing BU

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Associative Fine-Tuning of Biologically Inspired Active Neuro-Associative Knowledge Graphs Adrian

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Associative dyadic boolean functions Goals Def: A Boolean function f : { 0 , 1 } 2 { 0 , 1 }

Example: Associative Arrays An environment can be expressed as an associative array, e.g.:

10. Left-associative grammar (LAG) 10.1 Rule types and derivation order 10.1.1 The notion

Associative caches (3 rd Ed: p.496-504, 4 th Ed: 479-487) flexible block placement schemes

Non-forking formulas in Distal NIP theories Charlotte Kestner joint work with Gareth Boxall

ENCODE Encyclopedia Goal : Use a genome browser to show

Model-theoretic distality and incidence combinatorics Artem Chernikov UCLA Model Theory and

Plan of this talk Introduction Part I Studies in WHOT-QCD A variational

Roland Walker (UIC) Distality Rank 2020 0 / 49 Distality Rank Roland Walker

Model Theory and Combinatorial Geometry. Sergei Starchenko (joint with Artem Chernikov and David

Yucatec Maya: A Fragment Justin Bai Maksymilian Dabkowski Kalinda Pride Nicholas Tomlin

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias