A REINFORCEMENT LEARNING PERSPECTIVE ON AGI
Itamar Arel, Machine Intelligence Lab (http://mil.engr.utk.edu) The University of Tennessee
A REINFORCEMENT LEARNING PERSPECTIVE ON AGI Itamar Arel, Machine - - PowerPoint PPT Presentation
A REINFORCEMENT LEARNING PERSPECTIVE ON AGI Itamar Arel, Machine Intelligence Lab (http://mil.engr.utk.edu) The University of Tennessee Tutorial outline 2 What makes an AGI system ? A quick-and-dirty intro to RL Making the
Itamar Arel, Machine Intelligence Lab (http://mil.engr.utk.edu) The University of Tennessee
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
What makes an AGI system? A quick-and-dirty intro to RL Making the connection RL AGI Challenges ahead Closing thoughts
2
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
Difficult to define “AGI” or “Cognitive Architectures” Potential “must haves” …
Application domain independence Fusion of multimodal, high-dimensional inputs Spatiotemporal pattern recognition/inference “Strategic thinking” – long/short term impact
3
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
Experience driven learning Decision-making under
Goal: Maximize a
Maximize long-term rewards
Unique to RL: solves the
Observations Actions Rewards
Stochastic, Dynamic Environment
4
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
A form of unsupervised
Two primary components
Trial-and-error Delayed rewards
Origins of RL: Dynamic
Stochastic, Dynamic Environment
Observations Actions Rewards
5
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
Environment is modeled as a Markov Decision
S – state space A(s) – set of actions possible in state sS
a ss
a ss
'
6
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
Fully-observable
Huge state set (board
Finite action set –
Rewards: Win +1
7
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
An MDP is defined by the state transition
Agent’s goal is to maximize the rewards prospect
t t t a ss
1 '
1 1 '
t t t t a ss
1 3 2 2 1
t t t t
8
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
The state-value function for policy is Alternatively, we may deal with the state-action
The latter is often easier to work with
t k k t k t t
1
t t k k t k t t t
1
9
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
Bellman equations
' ' ' ' ' '
a ss s a ss a ss s a ss
10
1 1
t t t
Temporal difference learning
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu We’re looking for an optimal policy * that would
Policy evaluation – for some RL problem – solve MDP when environment model is
Key idea – use samples obtained by interaction with the
) ( ' ' ) ( ' 1
k s ss s s ss k
Dynamics unknown
11
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
For a given policy with value function V(s) The new policy is always better Converging iterative process (under reasonable
' ' '
a ss s a ss a
12
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
Exploitation of actions that worked in the past Exploration of new, alternative action paths so as to learn
13
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
No “state” signal provided
Instead, we have (partial) observations Agent needs to infer state
No model - dynamics need to be learned No tabular form solutions (don’t scale) …
Huge/continuous state spaces Huge/continuous action spaces Multi-dimensional reward signals
14
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
Each time agent sees a “car” the same state signal
States are individual to the agent State inferences can occur only when environment
15
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
Environment dynamics unknown What is a model – any system that helps us
Model-based RL – model is not available, but is
Current observation and action Predicted next
16
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
Function approximation (FA) - a must
Key to generalization
Good news: many FA technologies out there
Radial basis functions Neural networks Bayesian networks Fuzzy logic …
17
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
Historically, ML has been in CS turf
Von Neumann architecture?
Brain operates @ ~150 Hz Hosts 100 billion processors Software limits scalability
256 cores is still not
Need vast memory bandwidth
Analog circuitry
18
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
Don’t care for “optimal policy” Stay away from reverse engineering Learning takes time! Value function definition needs work
Internal (“intrinsic”) vs. external rewards Exploration vs. exploitation
Hardware realization Scalable function approximation engines
19
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
Actions
State-action value est.
Environment
Observations Action correction
20
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
The general framework is promising for AGI
Offers elegance Biologically-inspired approach
Scaling model-based RL VLSI technology exists today!
>2B transistors on a chip
21
AGI 2009 UT Machine Intelligence Lab http://mil.engr.utk.edu
22