Model-Selection for Non-Parametric Function Approximation: A Case - PowerPoint PPT Presentation

Motivation Value Function Approximation Related Work Summary Model-Selection for Non-Parametric Function Approximation: A Case Study in a Smart Energy System Daniel Urieli Peter Stone Department of Computer Science The University of Texas at Austin {urieli,pstone}@cs.utexas.edu ECML 2013 Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Motivation A smart energy problem: Controlling a thermostat for reducing energy consumption in an HVAC a system while maintaining comfort requirements a Heating, Ventilation and Air-Conditioning General Motivation Applying value-function based reinforcement learning (RL) to discrete-time, continuous-control problems Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Discrete-Time, Continuous Control Problems System’s state-space is continuous Control actions are taken at discrete times Further assuming that action-set is small and discrete Examples: Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Value-Function based RL In theory, value-function based RL can solve such problems optimally In practice, it is often unclear how to approximate the value function well enough Indeed, recent successes used direct policy search Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Value-Function based RL Still, value-function based RL has desirable advantages: Aiming for global optimum Bootstrapping = ⇒ less interactions with the real-world Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Case Study: Smart Thermostat Control Minimize energy consumption while satisfying this comfort specification Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Case Study: Smart Thermostat Control Straightforward turn-off strategy fails to satisfy both requirements Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Smart Thermostat Control as an MDP We model the problem as an MDP: S : {� T in , T out , Time �} A : { COOL , OFF , HEAT , AUX } P : computed by the simulator, initially unknown R : − energyConsumedByLastAction − C 6 pm T : { s ∈ S | s . time == 23:59pm } Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Plan For the value-function (VF) approximation part, we need to: Choose a function approximator 1 Choose an algorithm to compute the approximate VF 2 Tune the function approximator’s parameters through 3 model-selection Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results The Challenge of Value-Function Approximation 6.5 6 5.5 5 value 4.5 4 3.5 3 2.5 2 −6 −4 −2 0 2 4 6 8 10 12 state Must differentiate optimal from suboptimal action Non-trivial with “small” action effects + smooth value function = ⇒ losses accumulate over time Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Function Approximation Methods 6.5 6 5.5 5 value 4.5 4 3.5 3 2.5 2 −6 −4 −2 0 2 4 6 8 10 12 state Discretization Suffers from the curse of dimensionality at the required resolution levels Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Function Approximation Methods 6.5 6 5.5 5 value 4.5 4 3.5 3 2.5 2 −6 −4 −2 0 2 4 6 8 10 12 state Linear Function Approximation Depends on choosing good features Frequently not clear how to do that Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Function Approximation Methods 6.5 6 5.5 5 4.5 4 3.5 3 2.5 2 −6 −4 −2 0 2 4 6 8 10 12 Non-Parametric: can represent any function Using lots of data... Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Non-Parametric Value Function Approximation 6.5 6.5 6.5 6 6 6 5.5 5.5 5.5 5 5 5 value value 4.5 4.5 4.5 4 4 4 3.5 3.5 3.5 3 3 3 2.5 2.5 2.5 2 2 2 −6 −4 −2 0 2 4 6 8 10 12 −6 −4 −2 0 2 4 6 8 10 12 −6 −4 −2 0 2 4 6 8 10 12 state state To minimize the assumptions about the VF representation we use a smooth, non-parametric function approximator: Locally Weighted Linear Regression (LWR) Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Compute an Approximate VF Using FVI To compute the approximate VF, we use Fitted Value Iteration (FVI): RepeatUntilConvergence { ∀ i ∈ 1 , . . . , m y ( i ) := max a “ ” R ( s ( i ) , a ) + γ E [ s ′ | s ( i ) a ] [ˆ V π ∗ ( s ′ )] “ ” V π ∗ ( s ) := LWR ˆ {� s ( i ) , y ( i ) �| i ∈ 1 , . . . , m } } S FVI := { s ( 1 ) , s ( 2 ) , . . . , s ( m ) } Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Model-Selection for LWR LWR needs tuning, for instance the kernel bandwidth in 1-d: 6.5 4 6 3.5 5.5 3 5 4.5 2.5 value 4 2 3.5 1.5 3 1 2.5 0.5 2 1.5 0 −6 −4 −2 0 2 4 6 8 10 12 −3 −2 −1 0 1 2 3 state Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

Model-Selection for Non-Parametric Function Approximation: A Case - PowerPoint PPT Presentation

Motivation Value Function Approximation Related Work Summary Model-Selection for Non-Parametric Function Approximation: A Case Study in a Smart Energy System Daniel Urieli Peter Stone Department of Computer Science The University of Texas

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Sequential Model List Selection for Function Approximation Ernest Fokou e epf@samsi.info

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

TCTL model checking lower/upper-bound Introduction parametric timed automata without Parametric

6. Approximation and fitting norm approximation least-norm problems regularized

Towards a non-parametric Towards a non-parametric stochastic framework: a consistent approach of

Non parametric prediction and mapping of standing Non-parametric prediction and mapping of

CMSC427 Notes on piecewise parametric curves: Hermite, Catmull-Rom, and Bezier I. Parametric

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Coxs proportional hazards model and Coxs partial likelihood Rasmus Waagepetersen October

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Not Smooth High degree approximation Explicit y=f(x) Implicit f(x,y)=0 Parametric

Optical Parametric Generation and Amplification 1 Optical Parametric Generation Sum frequency

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

Introduction to OS (cs1550) Why take this class? Why with Mosse? its mandatory

The Evolution of Python Juha Helminen 10.11.2009 ABC Imperative programming language and

Calabi-Yaus and Other Animals [arXiv:1805.09326] with J. Bourjaily, Y.-H. He, A. Mcleod, and M.

Understand the Content Lifecycle: Make it Work for You Mollye Barrett Leigh White Goals

Manners or Murder LESSON 13 Your Response to the Lesson What was most interesting in the Bible

Tropical bases by regular projections Kerstin Hept Eindhoven, 10/31/07 joint work with Thorsten

Amplifying Allows temporary increase of privileges Needed for modular programming

Distributed Objects: A Lightning Tour Distributed Objects: A Lightning Tour What is an

Model-Selection for Non-Parametric Function Approximation: A Case - PowerPoint PPT Presentation

Motivation Value Function Approximation Related Work Summary Model-Selection for Non-Parametric Function Approximation: A Case Study in a Smart Energy System Daniel Urieli Peter Stone Department of Computer Science The University of Texas

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Sequential Model List Selection for Function Approximation Ernest Fokou e epf@samsi.info

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

TCTL model checking lower/upper-bound Introduction parametric timed automata without Parametric

6. Approximation and fitting norm approximation least-norm problems regularized

Towards a non-parametric Towards a non-parametric stochastic framework: a consistent approach of

Non parametric prediction and mapping of standing Non-parametric prediction and mapping of

CMSC427 Notes on piecewise parametric curves: Hermite, Catmull-Rom, and Bezier I. Parametric

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Coxs proportional hazards model and Coxs partial likelihood Rasmus Waagepetersen October

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Not Smooth High degree approximation Explicit y=f(x) Implicit f(x,y)=0 Parametric

Optical Parametric Generation and Amplification 1 Optical Parametric Generation Sum frequency

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

Introduction to OS (cs1550) Why take this class? Why with Mosse? its mandatory

The Evolution of Python Juha Helminen 10.11.2009 ABC Imperative programming language and

Calabi-Yaus and Other Animals [arXiv:1805.09326] with J. Bourjaily, Y.-H. He, A. Mcleod, and M.

Understand the Content Lifecycle: Make it Work for You Mollye Barrett Leigh White Goals

Manners or Murder LESSON 13 Your Response to the Lesson What was most interesting in the Bible

Tropical bases by regular projections Kerstin Hept Eindhoven, 10/31/07 joint work with Thorsten

Amplifying Allows temporary increase of privileges Needed for modular programming

Distributed Objects: A Lightning Tour Distributed Objects: A Lightning Tour What is an

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?