Model-Selection for Non-Parametric Function Approximation: A Case - - PowerPoint PPT Presentation

model selection for non parametric function approximation
SMART_READER_LITE
LIVE PREVIEW

Model-Selection for Non-Parametric Function Approximation: A Case - - PowerPoint PPT Presentation

Motivation Value Function Approximation Related Work Summary Model-Selection for Non-Parametric Function Approximation: A Case Study in a Smart Energy System Daniel Urieli Peter Stone Department of Computer Science The University of Texas


slide-1
SLIDE 1

Motivation Value Function Approximation Related Work Summary

Model-Selection for Non-Parametric Function Approximation: A Case Study in a Smart Energy System

Daniel Urieli Peter Stone

Department of Computer Science The University of Texas at Austin {urieli,pstone}@cs.utexas.edu

ECML 2013

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-2
SLIDE 2

Motivation Value Function Approximation Related Work Summary VF for Discrete-Time, Continuous Control Problems Case Study: Smart Energy System (Problem setup definition)

Motivation

A smart energy problem: Controlling a thermostat for reducing energy consumption in an HVACa system while maintaining comfort requirements

aHeating, Ventilation and Air-Conditioning

General Motivation Applying value-function based reinforcement learning (RL) to discrete-time, continuous-control problems

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-3
SLIDE 3

Motivation Value Function Approximation Related Work Summary VF for Discrete-Time, Continuous Control Problems Case Study: Smart Energy System (Problem setup definition)

Discrete-Time, Continuous Control Problems

System’s state-space is continuous Control actions are taken at discrete times Further assuming that action-set is small and discrete Examples:

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-4
SLIDE 4

Motivation Value Function Approximation Related Work Summary VF for Discrete-Time, Continuous Control Problems Case Study: Smart Energy System (Problem setup definition)

Value-Function based RL

In theory, value-function based RL can solve such problems optimally In practice, it is often unclear how to approximate the value function well enough Indeed, recent successes used direct policy search

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-5
SLIDE 5

Motivation Value Function Approximation Related Work Summary VF for Discrete-Time, Continuous Control Problems Case Study: Smart Energy System (Problem setup definition)

Value-Function based RL

In theory, value-function based RL can solve such problems optimally In practice, it is often unclear how to approximate the value function well enough Indeed, recent successes used direct policy search

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-6
SLIDE 6

Motivation Value Function Approximation Related Work Summary VF for Discrete-Time, Continuous Control Problems Case Study: Smart Energy System (Problem setup definition)

Value-Function based RL

In theory, value-function based RL can solve such problems optimally In practice, it is often unclear how to approximate the value function well enough Indeed, recent successes used direct policy search

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-7
SLIDE 7

Motivation Value Function Approximation Related Work Summary VF for Discrete-Time, Continuous Control Problems Case Study: Smart Energy System (Problem setup definition)

Value-Function based RL

Still, value-function based RL has desirable advantages:

Aiming for global optimum Bootstrapping = ⇒ less interactions with the real-world

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-8
SLIDE 8

Motivation Value Function Approximation Related Work Summary VF for Discrete-Time, Continuous Control Problems Case Study: Smart Energy System (Problem setup definition)

Case Study: Smart Thermostat Control

Minimize energy consumption while satisfying this comfort specification

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-9
SLIDE 9

Motivation Value Function Approximation Related Work Summary VF for Discrete-Time, Continuous Control Problems Case Study: Smart Energy System (Problem setup definition)

Case Study: Smart Thermostat Control

Straightforward turn-off strategy fails to satisfy both requirements

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-10
SLIDE 10

Motivation Value Function Approximation Related Work Summary VF for Discrete-Time, Continuous Control Problems Case Study: Smart Energy System (Problem setup definition)

Smart Thermostat Control as an MDP

We model the problem as an MDP: S: {Tin, Tout, Time} A: {COOL, OFF, HEAT, AUX} P: computed by the simulator, initially unknown R: −energyConsumedByLastAction − C6pm T: {s ∈ S|s.time == 23:59pm}

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-11
SLIDE 11

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Plan

For the value-function (VF) approximation part, we need to:

1

Choose a function approximator

2

Choose an algorithm to compute the approximate VF

3

Tune the function approximator’s parameters through model-selection

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-12
SLIDE 12

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

The Challenge of Value-Function Approximation

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

Must differentiate optimal from suboptimal action Non-trivial with “small” action effects + smooth value function = ⇒ losses accumulate over time

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-13
SLIDE 13

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

The Challenge of Value-Function Approximation

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

Must differentiate optimal from suboptimal action Non-trivial with “small” action effects + smooth value function = ⇒ losses accumulate over time

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-14
SLIDE 14

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

The Challenge of Value-Function Approximation

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

Must differentiate optimal from suboptimal action Non-trivial with “small” action effects + smooth value function = ⇒ losses accumulate over time

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-15
SLIDE 15

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Function Approximation Methods

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

Discretization Suffers from the curse of dimensionality at the required resolution levels

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-16
SLIDE 16

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Function Approximation Methods

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

Linear Function Approximation Depends on choosing good features Frequently not clear how to do that

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-17
SLIDE 17

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Function Approximation Methods

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

Non-Parametric: can represent any function Using lots of data...

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-18
SLIDE 18

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Non-Parametric Value Function Approximation

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

To minimize the assumptions about the VF representation we use a smooth, non-parametric function approximator: Locally Weighted Linear Regression (LWR)

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-19
SLIDE 19

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Compute an Approximate VF Using FVI

To compute the approximate VF, we use Fitted Value Iteration (FVI):

SFVI := {s(1), s(2), . . . , s(m)} RepeatUntilConvergence{ ∀i ∈ 1, . . . , m y(i) := maxa “ R(s(i), a) + γE[s′|s(i)a][ˆ V π∗(s′)] ” ˆ V π∗(s) := LWR “ {s(i), y(i)|i ∈ 1, . . . , m} ” }

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-20
SLIDE 20

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Model-Selection for LWR

LWR needs tuning, for instance the kernel bandwidth in 1-d:

−6 −4 −2 2 4 6 8 10 12 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

−3 −2 −1 1 2 3 0.5 1 1.5 2 2.5 3 3.5 4

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-21
SLIDE 21

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Model-Selection for LWR

LWR needs tuning, for instance the kernel bandwidth in 1-d:

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

−3 −2 −1 1 2 3 0.5 1 1.5 2 2.5 3 3.5 4

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-22
SLIDE 22

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Model-Selection for LWR

LWR needs tuning, for instance the kernel bandwidth in 1-d:

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

−3 −2 −1 1 2 3 0.5 1 1.5 2 2.5 3 3.5

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-23
SLIDE 23

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Model-Selection for LWR

LWR needs tuning, for instance the kernel bandwidth in 1-d:

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

−3 −2 −1 1 2 3 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-24
SLIDE 24

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Model-Selection for LWR

LWR needs tuning, for instance the kernel bandwidth in 1-d:

−6 −4 −2 2 4 6 8 10 12 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

−3 −2 −1 1 2 3 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-25
SLIDE 25

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Model-Selection for LWR

LWR needs tuning, for instance the kernel bandwidth in 1-d:

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

−3 −2 −1 1 2 3 0.038 0.0385 0.039 0.0395 0.04 0.0405

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-26
SLIDE 26

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Model-Selection for LWR

LWR needs tuning, for instance the kernel bandwidth in 1-d:

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

state value

−3 −2 −1 1 2 3 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-27
SLIDE 27

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Model-Selection for LWR in N-dimensions

In N dimensions, it is common to tune N+1 parameters:

1 bandwidth parameter: τ N attribute-scaling parameters: c1, . . . , cn

−3 −2 −1 1 2 3 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Tuning these parameters is a form of model-selection

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-28
SLIDE 28

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Model Selection - How to Evaluate A Model?

Model-evaluation measure? In supervised learning: prediction performance on held-out sets In reinforcement learning?

−6 −4 −2 2 4 6 8 10 12 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

?

We don’t have the true values (labels) of states

1: for i = 1 → numModels do 2:

run agent for 1 year with model?

3:

Record Total Reward

4: end for 5: choose best model

Performance is accumulated reward - often too expensive to evaluate

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-29
SLIDE 29

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Model Selection - How to Evaluate A Model?

We use the fact that the optimal value function must satisfy Bellman’s optimality equation: ˆ V ≡ V π∗ ⇐ ⇒ ∀s ∈ S : BEˆ

V(s) = 0

where BEˆ

V(s) := | ˆ

V(s) − maxa(R(s, a) + γE[s′|sa][ ˆ V(s′)])| It already holds for s ∈ SFVI (FVI’s convergence condition). But not necessarily for s / ∈ SFVI

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-30
SLIDE 30

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

The Resulting Model Evaluation Measure

Therefore, to evaluate a model, we:

1

Sample random states T := {t(1), ..., t(m′)}, ti / ∈ SFVI, |T | >> |SFVI|

2

Use ||BEˆ

V(T )||∞ as model evaluation measure

Model-Selection becomes minimizing F : Rn+1 → R where (c1, . . . , cn, τ) → ||BEˆ

V(T )||∞

No need to evaluate an agent in the environment

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-31
SLIDE 31

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Practical Model-Selection: 2 conditions

To have a practical model-selection algorithm we need to show that:

1

Bellman Error is correlated with actual performance

2

Finding the minimum can be done efficiently

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-32
SLIDE 32

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Correlation Between the Bellman Errors and Performance

10

4

10

6

10

8

10

10

0.9 1 1.1 1.2 1.3x 10

7

L1 Bellman Error Energy (kWh) 10

2

10

4

10

6

10

8

0.9 1 1.1 1.2 1.3x 10

7

L2 Bellman Error Energy (kWh) 10 10

2

10

4

10

6

0.9 1 1.1 1.2 1.3x 10

7

L∞ Bellman Error Energy (kWh) 8.5 9 9.5 10 x 10

4

0.95 1 1.05 1.1x 10

7

L1 Bellman Error Energy (kWh) 350 400 450 500 0.95 1 1.05 1.1x 10

7

L2 Bellman Error Energy (kWh) 9 10 11 12 0.96 0.98 1 1.02 1.04x 10

7

L∞ Bellman Error Energy (kWh)

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-33
SLIDE 33

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

The MSNP Algorithm

We use these two assumptions to define the following model-selection algorithm, named MSNP:

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-34
SLIDE 34

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Efficiently Optimizing the Bellman Errors

10 20 30 40 10 20 30 40 50 60

iteration # Max Bellman Error

Brent Amoeba Powell

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-35
SLIDE 35

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Basins of Convergence of the Max Bellman Error

Plotting pi → ||BEˆ

V(T )||∞, for each pi ∈ {c1, c2, c3, τ} (for j = i,

pj are held fixed at default values)

2 4 6 8 10 12 14 16 18 20 55 60 65 70 75 80 85 90 95 100

LWR Bandwidth (τ) Parameter Max Bellman Error

2 4 6 8 10 12 14 16 18 20 500 1000 1500 2000 2500 3000 3500 4000

LWR T

in Scaling Parameter

Max Bellman Error

2 4 6 8 10 12 14 16 18 20 50 100 150 200 250 300 350 400 450

LWR T

  • ut Scaling Parameter

Max Bellman Error

2 4 6 8 10 12 14 16 18 20 1 2 3 4 5 6 7 x 10

4

LWR Time Scaling Parameter Max Bellman Error

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-36
SLIDE 36

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Temperature Graphs

MSNP Default Turn-off

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-37
SLIDE 37

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Performance of MSNP

Comparing Yearly energy consumption (lower is better) Default: default strategy that is deployed in practice MSNP: our model-selection algorithm is

1

better than LargeSample

2

close to CMA-ES

City Default (kWh) LargeSample (kWh)

MSNP (kWh)

CMA-ES (kWh) % Energy-Savings New York City 11084.8 10923.5 9859.3 9816.3 11.0% Boston 12277.1 12480.7 11433.6 11052.8 6.9% Chicago 15172.5 14778.2 14186 13778.4 6.5% Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-38
SLIDE 38

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Performance of MSNP

Comparing Yearly energy consumption (lower is better) Default: default strategy that is deployed in practice MSNP: our model-selection algorithm is

1

better than LargeSample

2

close to CMA-ES

City Default (kWh) LargeSample (kWh)

MSNP (kWh)

CMA-ES (kWh) % Energy-Savings New York City 11084.8 10923.5 9859.3 9816.3 11.0% Boston 12277.1 12480.7 11433.6 11052.8 6.9% Chicago 15172.5 14778.2 14186 13778.4 6.5% Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-39
SLIDE 39

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Performance of MSNP

Comparing Yearly energy consumption (lower is better) Default: default strategy that is deployed in practice MSNP: our model-selection algorithm is

1

better than LargeSample

2

close to CMA-ES

City Default (kWh) LargeSample (kWh)

MSNP (kWh)

CMA-ES (kWh) % Energy-Savings New York City 11084.8 10923.5 9859.3 9816.3 11.0% Boston 12277.1 12480.7 11433.6 11052.8 6.9% Chicago 15172.5 14778.2 14186 13778.4 6.5% Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-40
SLIDE 40

Motivation Value Function Approximation Related Work Summary Function Approximation Methods FVI Model-selection Main Results

Performance of MSNP

Comparing Yearly energy consumption (lower is better) Default: default strategy that is deployed in practice MSNP: our model-selection algorithm is

1

better than LargeSample

2

close to CMA-ES

City Default (kWh) LargeSample (kWh)

MSNP (kWh)

CMA-ES (kWh) % Energy-Savings New York City 11084.8 10923.5 9859.3 9816.3 11.0% Boston 12277.1 12480.7 11433.6 11052.8 6.9% Chicago 15172.5 14778.2 14186 13778.4 6.5% Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-41
SLIDE 41

Motivation Value Function Approximation Related Work Summary

Related Work

Bellman error for generalized policy iteration (Antos et al 2008, Lagoudakis and Parr 2003) Bellman error for tuning basis functions in linear architectures (Keller et al 2006, Menache et al 2005, Parr et al 2007) LWR Model selection for learning a transition-function (Ng et al 2004) Abstract model-selection algorithm for RL (Farahmand and Szepesvari 2011)

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation

slide-42
SLIDE 42

Motivation Value Function Approximation Related Work Summary

Summary

Introduced MSNP - practical model selection algorithm for RL

MSNP is based on two main ideas:

9 10 11 12 0.96 0.98 1 1.02 1.04x 10

7

L∞ Bellman Error Energy (kWh) 10 20 30 40 10 20 30 40 50 60

iteration # Max Bellman Error

Brent Amoeba Powell

Value-function based RL for thermostat control Outlook

Theoretical analysis, Bellman Error’s basin of convergence High-dimensional problems

Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation