1 Feature-Based Representations How to use features? Solution: - PDF document

Logistics CS 473: Artificial Intelligence Reinforcement Learning III  PS3 – due 11/12 Travis Mandel (filling in for Dan) / University of Washington 2 [Most slides were taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Approximate Q-Learning Reinforcement Learning Recap  Model-based approach  Model-free approaches  TD-learning  Tabular Q-Learning  Epsilon-Greedy, Exploration Functions  TODAY: Approximate Linear Q-Learning 4 Generalizing Across States Example: Pacman  Basic Q-Learning keeps a table of all q-values Let’s say we discover In naïve q-learning, Or even this one! through experience we know nothing  In realistic situations, we cannot possibly learn that this state is bad: about this state: about every single state!  Too many states to visit them all in training  Too many states to hold the q-tables in memory  Instead, we want to generalize:  Learn about some small number of training states from experience  Generalize that experience to new, similar situations  This is a fundamental idea in machine learning, and we’ll see it over and over again [Demo: Q-learning – pacman – tiny – watch all (L11D5)] [Demo: Q-learning – pacman – tiny – silent train (L11D6)] [Demo: Q-learning – pacman – tricky – watch all (L11D7)] [demo – RL pacman] 1

Feature-Based Representations How to use features?  Solution: describe a state using a vector of  Using a feature representation, we can write a q function (or value function) for any features (aka “properties”) state  Features are functions from states to real numbers (often 0/1) that capture important properties of the 𝑊 𝑡 = 𝑕(𝑔 1 𝑡 , 𝑔 2 𝑡 , … , 𝑔 𝑜 𝑡 ) state  Example features:  Distance to closest ghost 𝑅 𝑡, 𝑏 = 𝑕(𝑔 1 𝑡 , 𝑔 2 𝑡 , … , 𝑔 𝑜 𝑡 )  Distance to closest dot  Number of ghosts  1 / (dist to dot) 2  Is Pacman in a tunnel? (0/1)  …… etc.  Is it the exact state on this slide?  Can also describe a q-state (s, a) with features (e.g. action moves closer to food) How to use features? Approximate Q-Learning  Using a feature representation, we can write a q function (or value function) for any state using a few weights:  Q-learning with linear Q-functions: Exact Q’s  Advantage: our experience is summed up in a few powerful numbers Approximate Q’s  Intuitive interpretation:  Disadvantage: states may share features but actually be very different in value!  Adjust weights of active features  E.g., if something unexpectedly bad happens, blame the features that were on: disprefer all states with that state’s features  Formal justification: in a few slides! Example: Pacman Features Example: Q-Pacman 𝑅 𝑡, 𝑏 = 𝑥 1 𝑔 𝐸𝑃𝑈 𝑡, 𝑏 + 𝑥 2 𝑔 𝐻𝑇𝑈 (𝑡, 𝑏) 1 𝑔 𝐸𝑃𝑈 𝑡, 𝑏 = 𝑒𝑗𝑡𝑢𝑏𝑜𝑑𝑓 𝑢𝑝 𝑑𝑚𝑝𝑡𝑓𝑡𝑢 𝑔𝑝𝑝𝑒 𝑏𝑔𝑢𝑓𝑠 𝑢𝑏𝑙𝑗𝑜𝑕 𝑏 𝑔 𝐸𝑃𝑈 𝑡, 𝑂𝑃𝑆𝑈𝐼 = 0.5 𝑔 𝐻𝑇𝑈 𝑡, 𝑏 = 𝑒𝑗𝑡𝑢𝑏𝑜𝑑𝑓 𝑢𝑝 𝑑𝑚𝑝𝑡𝑓𝑡𝑢 𝑕ℎ𝑝𝑡𝑢 𝑏𝑔𝑢𝑓𝑠 𝑢𝑏𝑙𝑗𝑜𝑕 𝑏 𝑔 𝐻𝑇𝑈 𝑡, 𝑂𝑃𝑆𝑈𝐼 = 1.0 [Demo: approximate Q- learning pacman (L11D10)] 2

Video of Demo Approximate Q-Learning -- Pacman Sidebar: Q-Learning and Least Squares Linear Approximation: Regression Optimization: Least Squares 40 26 24 20 22 Error or “residual” Observation 20 30 40 Prediction 20 0 30 0 20 20 10 10 0 0 Prediction: Prediction: 0 0 20 Minimizing Error Overfitting: Why Limiting Capacity Can Help 30 Imagine we had only one point x, with features f(x), target value y, and weights w: 25 20 Degree 15 polynomial 15 10 5 0 Approximate q update explained: -5 -10 “target” “prediction” -15 0 2 4 6 8 10 12 14 16 18 20 3

Simple Problem Just one feature. See a pattern?  Ghost one step away, pacman dies  Ghost one step away, pacman dies  Ghost one step away, pacman dies  Ghost one step away, pacman dies  Ghost one step away, pacman lives Given: Features of current state  Ghost more than one step away, pacman lives Predict: Will Pacman die on the next step?  Ghost more than one step away, pacman lives  Ghost more than one step away, pacman lives  Ghost more than one step away, pacman lives  Ghost more than one step away, pacman lives  Ghost more than one step away, pacman lives Learn: Ghost one step away  pacman dies! 21 22 See a pattern? What if we add more features?  Ghost one step away, pacman dies  Ghost one step away, score 211, pacman dies  Ghost one step away, pacman dies  Ghost one step away, score 341, pacman dies  Ghost one step away, pacman dies  Ghost one step away, score 231, pacman dies  Ghost one step away, pacman dies  Ghost one step away, score 121, pacman dies  Ghost one step away, pacman lives  Ghost one step away, score 301, pacman lives  Ghost more than one step away, pacman lives  Ghost more than one step away, score 205, pacman lives  Ghost more than one step away, pacman lives  Ghost more than one step away, score 441, pacman lives  Ghost more than one step away, pacman lives  Ghost more than one step away, score 219, pacman lives  Ghost more than one step away, pacman lives  Ghost more than one step away, score 199, pacman lives  Ghost more than one step away, pacman lives  Ghost more than one step away, score 331, pacman lives  Ghost more than one step away, pacman lives  Ghost more than one step away, score 251, pacman lives Learn: Ghost one step away  pacman dies! Learn: Ghost one step away AND score is NOT 301  pacman dies! 23 24 What if we add more features? Normal Programming now resuming…  Ghost one step away, score 211, pacman dies  Ghost one step away, score 341, pacman dies  Ghost one step away, score 231, pacman dies  Ghost one step away, score 121, pacman dies  Ghost one step away, score 301, pacman lives  Ghost more than one step away, score 205, pacman lives  Ghost more than one step away, score 441, pacman lives  Ghost more than one step away, score 219, pacman lives  Ghost more than one step away, score 199, pacman lives  Ghost more than one step away, score 331, pacman lives  Ghost more than one step away, score 251, pacman lives Learn: Ghost one step away AND score is NOT 301  pacman dies! 25 26 4

That’s all for Reinforcement Learning! CS 473: Artificial Intelligence Probability Data (experiences with Reinforcement Learning Policy (how to act in environment) Agent the future)  Very tough problem: How to perform any task well in an unknown, noisy environment!  Traditionally used mostly for robotics, but becoming more widely used  Lots of open research areas:  How to best balance exploration and exploitation?  How to deal with cases where we don’t know a good state/feature Instructor: Travis Mandel --- University of Washingtion representation? 31 [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Next Inference in Ghostbusters  Probability  A ghost is in the grid somewhere  Random Variables  Sensor readings tell how  Joint and Marginal Distributions close a square is to the  Conditional Distribution ghost  Product Rule, Chain Rule, Bayes’ Rule  On the ghost: red  Inference  1 or 2 away: orange  3 or 4 away: yellow  Independence  5+ away: green  You’ll need all this stuff A LOT for the  Sensors are noisy, but we know P(Color | Distance) next few weeks, so make sure you go P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3) over it now! 0.05 0.15 0.5 0.3 [Demo: Ghostbuster – no probability (L12D1) ] Video of Demo Ghostbuster – No probability Uncertainty  General situation:  Observed variables (evidence) : Agent knows certain things about the state of the world (e.g., sensor readings or symptoms)  Unobserved variables : Agent needs to reason about other aspects (e.g. where an object is or what disease is present)  Model : Agent knows something about how the known variables relate to the unknown variables  Probabilistic reasoning gives us a framework for managing our beliefs and knowledge 5

1 Feature-Based Representations How to use features? Solution: - PDF document

Logistics CS 473: Artificial Intelligence Reinforcement Learning III PS3 due 11/12 Travis Mandel (filling in for Dan) / University of Washington 2 [Most slides were taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI at UC

CS 188: Artificial Intelligence Search with other Agents II Instructor: Anca Dragan University

SPECTRAL THEORY FOR A MATHEMATICAL MODEL OF THE WEAK INTERACTION: THE DECAY OF THE INTERMEDIATE

Effective dimension, level statistics, and integrability of Sachdev-Ye-Kitaev-like models

ROSCOFF Convergence Acceleration of the PinT Integration of Advection Equation using Accurate

Differential Equation Axiomatization The Impressive Power of Differential Ghosts Andr e

Ghost effect by curvature Providence, November 2011 M&MOCS MATHEMATICS AND MECHANICS OF

CS4100 Outline CS 4100: Artificial Intelligence We Were re done one with h Part art I:

Number of confirmation blocks for Bitcoin and GHOST consensus protocols on networks with delayed

Bimetric theory, partial masslessness and conformal gravity Fawad Hassan Stockholm University,

Spark verification features Continued Paul Jackson School of Informatics University of

PROGRAMMING ADVANCED FEEDBACK WITH ARDUINO Part 09 - State machine structure The

Toward Gear-Change and Beam-Beam Simula5ons with GHOST Bala Terzi Department of Physics, Old

Local Verification of Global Invariants in Concurrent Programs Ernie Cohen 1 , Michal Moskal 2 ,

On fermionic ghosts and the removal from scalar-fermion systems Yuki Sakakihara (Osaka City

Logic Against Ghosts: Comparison of two Proof Approaches for Linked Lists Allan Blanchard

Gluon and Ghost Propagators from Schwinger-Dyson Equation and Lattice Simulations Joannis

Prophecy Variables in Separation Logic (Extending Iris with Prophecy Variables) Ralf Jung,

Finite element methods in scientifjc computing Wolfgang Bangerth, Colorado State University

Sharing Ghost Variables in a Collection of In a Reduced Product Abstract Domains Discussion Marc

One-leg off-shell helicity amplitudes in high-energy factorization Piotr Kotko Institute of

Coulomb gauge and Schwinger-Dyson equations Peter Watson Instit ut f ur Theoretische

Serving Two Masters An Empirical Study of Browser API Cooptation Pete Snyder, Chris Kanich

Centralizing the Mink Survey at Centralizing the Mink Survey at the National Agricultural

Low-Level Code and High-Level Theorems Sascha Bhme Technische Universitt Mnchen, Germany

1 Feature-Based Representations How to use features? Solution: - PDF document

Logistics CS 473: Artificial Intelligence Reinforcement Learning III PS3 due 11/12 Travis Mandel (filling in for Dan) / University of Washington 2 [Most slides were taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI at UC

CS 188: Artificial Intelligence Search with other Agents II Instructor: Anca Dragan University

SPECTRAL THEORY FOR A MATHEMATICAL MODEL OF THE WEAK INTERACTION: THE DECAY OF THE INTERMEDIATE

Effective dimension, level statistics, and integrability of Sachdev-Ye-Kitaev-like models

ROSCOFF Convergence Acceleration of the PinT Integration of Advection Equation using Accurate

Differential Equation Axiomatization The Impressive Power of Differential Ghosts Andr e

Ghost effect by curvature Providence, November 2011 M&amp;MOCS MATHEMATICS AND MECHANICS OF

CS4100 Outline CS 4100: Artificial Intelligence We Were re done one with h Part art I:

Number of confirmation blocks for Bitcoin and GHOST consensus protocols on networks with delayed

Bimetric theory, partial masslessness and conformal gravity Fawad Hassan Stockholm University,

Spark verification features Continued Paul Jackson School of Informatics University of

PROGRAMMING ADVANCED FEEDBACK WITH ARDUINO Part 09 - State machine structure The

Toward Gear-Change and Beam-Beam Simula5ons with GHOST Bala Terzi Department of Physics, Old

Local Verification of Global Invariants in Concurrent Programs Ernie Cohen 1 , Michal Moskal 2 ,

On fermionic ghosts and the removal from scalar-fermion systems Yuki Sakakihara (Osaka City

Logic Against Ghosts: Comparison of two Proof Approaches for Linked Lists Allan Blanchard

Gluon and Ghost Propagators from Schwinger-Dyson Equation and Lattice Simulations Joannis

Prophecy Variables in Separation Logic (Extending Iris with Prophecy Variables) Ralf Jung,

Finite element methods in scientifjc computing Wolfgang Bangerth, Colorado State University

Sharing Ghost Variables in a Collection of In a Reduced Product Abstract Domains Discussion Marc

One-leg off-shell helicity amplitudes in high-energy factorization Piotr Kotko Institute of

Coulomb gauge and Schwinger-Dyson equations Peter Watson Instit ut f ur Theoretische

Serving Two Masters An Empirical Study of Browser API Cooptation Pete Snyder, Chris Kanich

Centralizing the Mink Survey at Centralizing the Mink Survey at the National Agricultural

Low-Level Code and High-Level Theorems Sascha Bhme Technische Universitt Mnchen, Germany

Ghost effect by curvature Providence, November 2011 M&MOCS MATHEMATICS AND MECHANICS OF