A Learning Agent for HeatPump Thermostat Control Daniel Urieli and - - PowerPoint PPT Presentation

a learning agent for heat pump thermostat control
SMART_READER_LITE
LIVE PREVIEW

A Learning Agent for HeatPump Thermostat Control Daniel Urieli and - - PowerPoint PPT Presentation

A Learning Agent for HeatPump Thermostat Control Daniel Urieli and Peter Stone Department of Computer Science The University of Texas at Aus?n {urieli,pstone}@cs.utexas.edu Hea;ng, Ven;la;on, and Aircondi;oning (HVAC) systems


slide-1
SLIDE 1

A Learning Agent for Heat‐Pump Thermostat Control

Daniel Urieli and Peter Stone

Department of Computer Science

The University of Texas at Aus?n

{urieli,pstone}@cs.utexas.edu

slide-2
SLIDE 2
slide-3
SLIDE 3

Hea;ng, Ven;la;on, and Air‐condi;oning (HVAC) systems

slide-4
SLIDE 4

Heat‐Pump based HVAC System

  • Heat‐pump is widely used and highly efficient

– Its heat output is up to 3x‐4x the energy it consumes – Consumes electricity (rather than gas/oil based) can use renewable resources – But: no longer effec;ve in freezing outdoor temperatures

  • Backed up by an auxiliary heater

– Resis;ve heat coil – Unaffected by outdoor temperatures – But: consumes 2x the energy consumed by the heat‐pump heater

  • Heat pump is also used for cooling

+

slide-5
SLIDE 5

Thermostat – an HVAC System’s Decision Maker

  • The thermostat :

– Controls Comfort – Significantly affects energy consump;on

  • Current interest evident from the appearance of startup

companies like NEST, as well thermostats by more tradi;onal companies like Honeywell

slide-6
SLIDE 6

Goal:

Minimize energy consump;on while sa;sfying comfort requirements

www.dot.gov

slide-7
SLIDE 7

Contribu?ons:

  • 1. A complete reinforcement learning agent that learns and applies

a new, adap;ve control strategy for a heat‐pump thermostat

  • 2. Our agent achieves 7.0%‐14.5% yearly energy savings, while

maintaining the same comfort level, comparing to a deployed strategy

www.dot.gov

Goal:

Minimize energy consump;on while sa;sfying comfort requirements

slide-8
SLIDE 8

Simula;on Environment

  • GridLAB‐D: A realis;c smart‐grid simulator, simulates power

genera;on, loads and markets

  • Open‐source sofware, developed for the U.S. DOE, simulates

seconds to years

  • Realis;cally models a residen;al home

– Heat gains and losses, thermal mass, solar radia;on and weather effects, uses real weather data recorded by NREL (www.nrel.gov)

GridL AB‐D Core

Power Systems Buildings Control Systems Markets

.

Power Systems Buildings Control Systems Markets Internal Gains Solar HVAC Total Heat Tair Tout Tmass Cair Cmass Tset UAenv UAmass Qmass Qair Qsolar Qgains Qhvac ( )       + + + − = env
  • ut
air mass env air mass mass air air UA T Q UA UA T UA T C dt dT 1 ( ) [ ] mass mass air mass mass mass Q T T UA C dt dT + + = 1 ( ) ( ) ( )         ⋅ ⋅ + ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ = θ θ θ P P S I I S V V Z Z S V V P n n n a n n a i cos cos cos % % % 2 2 ( ) ( ) ( )         ⋅ ⋅ + ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ = θ θ θ P P S I I S V V Z Z S V V Q n n n a n n a i sin sin sin % % % 2 2 % % % 100 P I Z + + =             Δ Δ Δ =       Δ Δ − =       Δ Δ − mk rk rk rk mk mk rk mk rk mk mk rk V I V I V I V I J I I J V V δ δ δ δ δ δ δ δ 1 ( ) ( ) 1 2 2 1 2 2 = − − + − = Δ = − − + + = Δ

∑ ∑

= = n i ri ki mi ki mk rk rk sp k mk sp k mk n i mi ki ri ki mk rk mk sp k rk sp k rk V B V G V V V Q V P I V B V G V V V Q V P I adjust tap , if then , if then , if bw measured set h bw bw h D D l bw bw h D D D desired set End Feeder D V V V V V V V V V V V V V V V V V > − = > = < + = − =
  • ff
switch , if
  • n
switch , if min max capacitor needed capacitor needed Q d Q Q d Q < > V a V b V c V n R a V a V b V c V n R b R c R n X a X b X c X n Ia Ib Ic In V pri V reg Regulator Relay Actual Impedance Estimated Impedance R and X Regulator Feeder Transformer Estimated Regulation Point End of Feeder Regulator Output Control line 49 Wholesale Market Business Ops $ MW $ MW Market Market Generation Ops/SCADA Transmission Ops/SCADA Distribution Ops/SCADA Energy Management Control/SCADA distribution congestion ancillary services transmission congestion wholesale cost billing impact

GridL AB‐D Core

slide-9
SLIDE 9

Problem Setup

  • Simula;ng a typical residen;al home
  • Goal: minimize energy consumed by the heat‐pump , while

sa;sfying the following comfort spec:

Occupants are

– 12am‐7am: At home. – 7am‐6pm: Not at home. (the ”don’t care” period) – 6pm‐12am: At home.

slide-10
SLIDE 10

The Default Thermostat

slide-11
SLIDE 11

The Default Thermostat

slide-12
SLIDE 12

The Default Thermostat

slide-13
SLIDE 13

The Default Thermostat

slide-14
SLIDE 14

Can We Just Shut‐Down The Thermostat During “don’t‐care” Period?

  • Effec;ve way to save energy

– Indoor temp. closer to outdoor heat dissipa;on slows down

  • Simula;ng it…
  • In this case, the result is:

– Increased energy consump;on – Failure to sa;sfy the comfort spec

slide-15
SLIDE 15

Can We Just Shut‐Down The Thermostat During “don’t‐care” Period?

  • Effec;ve way to save energy

– Indoor temp. closer to outdoor heat dissipa;on slows down

  • Simula;ng it…
  • In this case, the result is:

– Increased energy consump;on – Failure to sa;sfy the comfort spec

Therefore, people frequently prefer to leave the thermostat on all day

slide-16
SLIDE 16

Can We Just Shut‐Down The Thermostat During “don’t‐care” Period?

  • Effec;ve way to save energy

– Indoor temp. closer to outdoor heat dissipa;on slows down

  • Simula;ng it…
  • In this case, the result is:

– Increased energy consump;on – Failure to sa;sfy the comfort spec

Therefore, people frequently prefer to leave the thermostat on all day However, a smarter shut‐ down should s;ll be able to save energy while maintaining comfort

slide-17
SLIDE 17

From the US Dept. of Energy’s website

slide-18
SLIDE 18

Challenges

Desired behavior: – Maximize shut‐down ;me while staying above the heat‐pump slope – Similarly for cooling (no AUX) Challenges:

  • The heat‐pump slope:

– Is unknown in advance – Changes every day – Depends on future weather – Depends on specific house characteris;cs

  • Ac;on effects are:

– Drifing rather than constant: since heat is being moved rather than generated, heat

  • utput strongly depends on the temperatures indoors, outdoors and along the heat path

– Noisy due to hidden physical condi;ons – Delayed due to heat capacitors like walls and furniture

  • Also, in a realis;c deployment:

– Explora;on cannot be too long or too aggressive – Customer acceptance will probably depend on worst‐case behavior

  • Making decisions in con;nuous, high dimensional space
slide-19
SLIDE 19

Our Problem as a Markov Decision Process (MDP)

  • States:
  • Ac?ons:
  • Transi?on:
  • Reward:
  • Terminal States:
  • Ac;on is taken every 6 minutes

– Modeling a realis;c lockout of the system

slide-20
SLIDE 20

Our Problem as a Markov Decision Process (MDP)

  • States:
  • Ac?ons: {COOL, OFF, HEAT, AUX}

1 : 0 : 2 : 4 consump;on (ea) propor;on

  • Transi?on:
  • Reward:
  • Terminal States:
  • Ac;on is taken every 6 minutes

– Modeling a realis;c lockout of the system

slide-21
SLIDE 21

Our Problem as a Markov Decision Process (MDP)

  • States:
  • Ac?ons: {COOL, OFF, HEAT, AUX}

1 : 0 : 2 : 4 consump;on (ea) propor;on

  • Transi?on:
  • Reward: – ea – 100000 Δ2

6pm where:

Δ2

6pm := (indoor_temp_at_6pm – required_indoor_temp_at_6pm)

  • Terminal States:
  • Ac;on is taken every 6 minutes

– Modeling a realis;c lockout of the system

slide-22
SLIDE 22

Our Problem as a Markov Decision Process (MDP)

  • States: ???
  • Ac?ons: {COOL, OFF, HEAT, AUX}

1 : 0 : 2 : 4 consump;on (ea) propor;on

  • Transi?on:
  • Reward: – ea – 100000 Δ2

6pm where:

Δ2

6pm := (indoor_temp_at_6pm – required_indoor_temp_at_6pm)

  • Terminal States:
  • Ac;on is taken every 6 minutes

– Modeling a realis;c lockout of the system

slide-23
SLIDE 23

How Should We Model State?

  • Choosing a state representa;on is an important design
  • decision. A state variable:

– captures what we need to know about the system at a given moment – is the variable around which we construct value func;on approxima;ons [Powell 2011]

  • Defini;on 5.4.1 from [Powell 2011]:

– A state variable is the minimally dimensioned func;on of history that is necessary and sufficient to compute the decision func;on, the transi;on func;on, and the contribu;on func;on.

slide-24
SLIDE 24

Our Problem as a Markov Decision Process (MDP)

  • States: <Tin, Time, ea>
  • Ac?ons: {COOL, OFF, HEAT, AUX}

1 : 0 : 2 : 4 consump;on (ea) propor;on

  • Transi?on:
  • Reward: – ea – 100000 Δ2

6pm where:

Δ2

6pm := (indoor_temp_at_6pm – required_indoor_temp_at_6pm)

  • Terminal States:
  • Ac;on is taken every 6 minutes

– Modeling a realis;c lockout of the system

slide-25
SLIDE 25

Our Problem as a Markov Decision Process (MDP)

  • States: <Tin, Time, ea>
  • Ac?ons: {COOL, OFF, HEAT, AUX}

1 : 0 : 2 : 4 consump;on (ea) propor;on

  • Transi?on:
  • Reward: – ea – 100000 Δ2

6pm where:

Δ2

6pm := (indoor_temp_at_6pm – required_indoor_temp_at_6pm)

  • Terminal States:
  • Ac;on is taken every 6 minutes

– Modeling a realis;c lockout of the system

slide-26
SLIDE 26

Expanding State to Compute the Transi;on Func;on

  • Can we predict ac;on effects for each of the state variables?
  • Current state representa;on: <Tin, Time, ea>
  • Need to be able to predict Tin and ea
  • Method: generate simulated data, use cross‐valida;on to test

for regression predic;on accuracy

slide-27
SLIDE 27

Predic;ng Tin

  • Predic;on error is unacceptably high – state <Tin, Time, ea> doesn’t

capture enough informa;on

slide-28
SLIDE 28

Predic;ng Tin

  • Predic;on error is unacceptably high – state <Tin, Time, ea> doesn’t

capture enough informa;on

  • Add Tout – directly affects Tin . Predic;on error s;ll unacceptably high
slide-29
SLIDE 29

Predic;ng Tin

  • Predic;on error is unacceptably high – state <Tin, Time, ea> doesn’t

capture enough informa;on

  • Add Tout – directly affects Tin . Predic;on error s;ll unacceptably high
  • Noise explained as hidden home state add history of observable

informa;on

– Previous ac;on

slide-30
SLIDE 30

Predic;ng Tin

  • Predic;on error is unacceptably high – state <Tin, Time, ea> doesn’t

capture enough informa;on

  • Add Tout – directly affects Tin . Predic;on error s;ll unacceptably high
  • Noise explained as hidden home state add history of observable

informa;on

– Previous ac;on – Measured Tin history of 10 temperatures: <t0>

slide-31
SLIDE 31

Predic;ng Tin

  • Predic;on error is unacceptably high – state <Tin, Time, ea> doesn’t

capture enough informa;on

  • Add Tout – directly affects Tin . Predic;on error s;ll unacceptably high
  • Noise explained as hidden home state add history of observable

informa;on

– Previous ac;on – Measured Tin history of 10 temperatures: <t0, t1>

slide-32
SLIDE 32

Predic;ng Tin

  • Predic;on error is unacceptably high – state <Tin, Time, ea> doesn’t

capture enough informa;on

  • Add Tout – directly affects Tin . Predic;on error s;ll unacceptably high
  • Noise explained as hidden home state add history of observable

informa;on

– Previous ac;on – Measured Tin history of 10 temperatures: <t0, t1, t2>

slide-33
SLIDE 33

Predic;ng Tin

  • Predic;on error is unacceptably high – state <Tin, Time, ea> doesn’t

capture enough informa;on

  • Add Tout – directly affects Tin . Predic;on error s;ll unacceptably high
  • Noise explained as hidden home state add history of observable

informa;on

– Previous ac;on – Measured Tin history of 10 temperatures: <t0, t1, t2, …, t9>

slide-34
SLIDE 34

Predic;ng Tin

  • Predic;on error is unacceptably high – state <Tin, Time, ea> doesn’t

capture enough informa;on

  • Add Tout – directly affects Tin . Predic;on error s;ll unacceptably high
  • Noise explained as hidden home state add history of observable

informa;on

– Previous ac;on – Measured Tin history of 10 temperatures: <t0, t1, t2, …, t9> – Resul;ng state: <Tin, Tout, Time, ea, prevAc;on, t0, …,t9>

slide-35
SLIDE 35

Comple;ng the state defini;on

  • Resul;ng state: <Tin, Tout, Time, ea, prevAc;on, t0, …,t9 >
  • Can we predict the newly added variables?
  • Trivially, except for Tout
  • Therefore, add weatherForecast to state
  • weatherForecast doesn’t need to be predicted in our

transi;on func;on

  • This completes our state defini;on
  • The final resul;ng state is:

<Tin, Tout, Time, ea, prevAc;on, t0, …,t9, weatherForecast>

slide-36
SLIDE 36

Our Problem as a Markov Decision Process (MDP)

  • States: <Tin, Tout, Time, ea, prevAc;on, t0, …,t9, weatherForecast>
  • Ac?ons: {COOL, OFF, HEAT, AUX}

1 : 0 : 2 : 4 consump;on (ea) propor;on

  • Transi?on: unknown in advance learned
  • Reward: – ea – 100000 Δ2

6pm where:

Δ2

6pm := (indoor_temp_at_6pm – required_indoor_temp_at_6pm)

  • Terminal States: {s | s.;me = 11:59pm}
  • Ac;on taken every 6 minutes

– Modeling a realis;c lockout of the system

  • State space is con;nuous and high dimensional
slide-37
SLIDE 37

Agent Opera;on

Choose Best Ac;on (TreeSearch) Observe Resul;ng State Record Ac;on Effect: <s,a,s’> If Midnight? Update House Model From data (regression)

Choose Random Ac;on Observe Resul;ng State Record Ac;on Effect: <s,a,s’>

First 3 days: explora;on Star;ng day 4: energy‐saving setback policy

slide-38
SLIDE 38

Agent Opera;on

Choose Best Ac;on (TreeSearch) Observe Resul;ng State Record Ac;on Effect: <s,a,s’> If Midnight? Update House Model From data (regression)

Choose Random Ac;on Observe Resul;ng State Record Ac;on Effect: <s,a,s’>

First 3 days: explora;on Star;ng day 4: energy‐saving setback policy

slide-39
SLIDE 39

Explora;on

  • Random ac;ons for 3 days
  • Could use more advanced explora;on policy
  • However, this is s;ll a realis;c setup
slide-40
SLIDE 40

Explora;on

  • Random ac;ons for 3 days
  • Could use more advanced explora;on policy
  • However, this is s;ll a realis;c setup

– For instance when occupants are traveling during the weekend

slide-41
SLIDE 41

Agent Opera;on

Choose Best Ac;on (TreeSearch) Observe Resul;ng State Record Ac;on Effect: <s,a,s’> If Midnight? Update House Model From data (regression)

Choose Random Ac;on Observe Resul;ng State Record Ac;on Effect: <s,a,s’>

First 3 days: explora;on Star;ng day 4: energy‐saving setback policy

slide-42
SLIDE 42

Update House Model from Data

  • Every midnight, use all the recorded data <s, a, s’> to

es;mate the house’s transi;on func;on

  • Linear Regression to es;mate <s,a> s’
slide-43
SLIDE 43

Agent Opera;on

Choose Best Ac;on (TreeSearch) Observe Resul;ng State Record Ac;on Effect: <s,a,s’> If Midnight? Update House Model From data (regression)

Choose Random Ac;on Observe Resul;ng State Record Ac;on Effect: <s,a,s’>

First 3 days: explora;on Star;ng day 4: energy‐saving setback policy

slide-44
SLIDE 44

Choosing the Best Ac;on

  • Dealing with con;nuous high‐dimensional state space
  • Imprac;cal to compute a value func;on
  • Run a tree search at every step
  • Choose the first ac;on of the best search as the next ac;on
slide-45
SLIDE 45

Safety Buffer in a Tree Search

C ~ 0 C ~ 2σ

slide-46
SLIDE 46

Results

  • Simulate 1 year under different weather condi;ons
  • 21 residen;al homes of sizes 1000‐4000 f2
  • Using real weather data recorded in

NYC Boston Chicago

  • Why cold ci;es? Since hea;ng consumes 2x‐4x more energy
slide-47
SLIDE 47

Temperature Graphs – Learned Setback Policy

slide-48
SLIDE 48

Energy Savings

slide-49
SLIDE 49

Comfort Performance

  • In more than 22,000 simulated days
slide-50
SLIDE 50

Related Work

  • [Rogers et al. 2011] – adap;ve thermostat that tries to minimize price & peak

demand rather than the total amount of energy.

  • [Hafner and Riedmiller 2011; Kretchmar 2000] – use RL to tune an HVAC system.
  • [T. Peffer et al. 2011] – How people use thermostats in homes
  • Learning thermostats in commercial companies

– NEST, Honeywell… – Technical details and actual performance are not published

slide-51
SLIDE 51

Summary

  • A complete, adap;ve, RL agent

for controlling a heat‐pump thermostat

  • Techniques:

– Carefully defined the problem as an MDP – Carefully chose a state representa;on – Using an efficient, specialized tree‐search

  • Experiments run on a range of homes

and weather condi;ons

  • Achieves 7%‐14.5% yearly energy savings in simula;on,

while sa;sfying comfort requirements, comparing to the deployed strategy

+ + +

Thank you!

slide-52
SLIDE 52

BACKUP

slide-53
SLIDE 53

Abla;on Analysis

  • Removing features and their combina;ons

– State features:

  • prevAct: previousAc;on
  • Hist: temperature history t0, …, t9

– conf: confidence buffer

  • Se•ng other values to the confidence bound