a learning agent for heat pump thermostat control
play

A Learning Agent for HeatPump Thermostat Control Daniel Urieli and - PowerPoint PPT Presentation

A Learning Agent for HeatPump Thermostat Control Daniel Urieli and Peter Stone Department of Computer Science The University of Texas at Aus?n {urieli,pstone}@cs.utexas.edu Hea;ng, Ven;la;on, and Aircondi;oning (HVAC) systems


  1. A Learning Agent for Heat‐Pump Thermostat Control Daniel Urieli and Peter Stone Department of Computer Science The University of Texas at Aus?n {urieli,pstone}@cs.utexas.edu

  2. Hea;ng, Ven;la;on, and Air‐condi;oning (HVAC) systems

  3. Heat‐Pump based HVAC System Heat‐pump is widely used and highly efficient • – Its heat output is up to 3x‐4x the energy it consumes – Consumes electricity (rather than gas/oil based) can use renewable resources + – But: no longer effec;ve in freezing outdoor temperatures Backed up by an auxiliary heater • – Resis;ve heat coil – Unaffected by outdoor temperatures – But: consumes 2x the energy consumed by the heat‐pump heater Heat pump is also used for cooling •

  4. Thermostat – an HVAC System’s Decision Maker • The thermostat : – Controls Comfort – Significantly affects energy consump;on • Current interest evident from the appearance of startup companies like NEST, as well thermostats by more tradi;onal companies like Honeywell

  5. Goal : Minimize energy consump;on while sa;sfying comfort requirements www.dot.gov

  6. Goal : Minimize energy consump;on while sa;sfying comfort requirements Contribu?ons : 1. A complete reinforcement learning agent that learns and applies a new, adap;ve control strategy for a heat‐pump thermostat 2. Our agent achieves 7.0%‐14.5% yearly energy savings, while maintaining the same comfort level, comparing to a deployed strategy www.dot.gov

  7. Simula;on Environment • GridLAB‐D: A realis;c smart‐grid simulator, simulates power genera;on, loads and markets • Open‐source sofware, developed for the U.S. DOE, simulates seconds to years • Realis;cally models a residen;al home – Heat gains and losses, thermal mass, solar radia;on and weather effects, uses real weather data recorded by NREL (www.nrel.gov) V = V − V D Feeder End V V V = + V a V a set desired D R a X a I a V b if V < V h , then V = V l V b R b X b I b D D bw bw V c V c if V V h , then V V h R c X c I c > = D D bw bw V n V n R n X n I n if V V V , adjust tap − > set measured bw if Q d Q , switch on > needed max capacitor sp sp P V + Q V n . I k rk k mk ∑ ( G V B V ) 0 if Q < d Q , switch off Δ = − − = rk 2 2 ki ri ki mi needed min capacitor V + V rk mk i = 1 Regulator Output Estimated Regulation End of Feeder sp sp Point P V − Q V n Feeder I k mk k rk ∑ ( G V B V ) 0 Transformer Regulator Actual Impedance Δ = − − = mk ki mi ki ri V 2 V 2 + Estimated Impedance rk mk i = 1 R and X Regulator Relay I I  δ Δ δ  V pri V reg mk mk V I    Δ   Δ  δ V δ V rk J 1 mk J = − − =  rk mk      I I Δ V Δ I δ Δ δ Δ Control line      rk rk  mk rk  δ V δ V    rk mk GridL AB‐D Power Systems Control Systems GridL AB‐D Power Systems Control Systems Core Markets Core Buildings Markets Buildings HVAC Internal Solar Gains Q gains Q solar Q hvac T set Total Heat T UA T ( UA UA ) dT 1  − + +  air mass mass air env mass Q mass = Q air dt C  Q T UA  + air  air out env  T out T air T mass UA env UA mass Market Market wholesale dT 1 Wholesale Market mass [ UA ( T T ) Q ] $ $ cost = + + Business Ops C air C mass mass air mass mass dt C MW MW mass Generation ancillary Ops/SCADA services  V 2  V P a S Z cos ( Z ) a S I cos ( ) I S P cos ( ) P =  ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ + ⋅ ⋅  i V 2 n % θ V n % θ n % θ    n n  Transmission transmission Ops/SCADA  V 2  congestion V Q a S Z sin Z ( ) a S I sin I ( ) S P sin ( ) P =  ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ + ⋅ ⋅  i V 2 n % θ V n % θ n % θ    n n  Distribution distribution 100 = Z + I + P Ops/SCADA congestion % % % Energy billing Management impact Control/SCADA 49

  8. Problem Setup • Simula;ng a typical residen;al home • Goal: minimize energy consumed by the heat‐pump , while sa;sfying the following comfort spec: Occupants are – 12am‐7am: At home . – 7am‐6pm: Not at home . (the ”don’t care” period) – 6pm‐12am: At home .

  9. The Default Thermostat

  10. The Default Thermostat

  11. The Default Thermostat

  12. The Default Thermostat

  13. Can We Just Shut‐Down The Thermostat During “don’t‐care” Period? • Effec;ve way to save energy – Indoor temp. closer to outdoor heat dissipa;on slows down • Simula;ng it… • In this case, the result is: – Increased energy consump;on – Failure to sa;sfy the comfort spec

  14. Can We Just Shut‐Down The Thermostat During “don’t‐care” Period? • Effec;ve way to save energy – Indoor temp. closer to outdoor heat dissipa;on slows down • Simula;ng it… Therefore, people frequently prefer to leave the thermostat on all day • In this case, the result is: – Increased energy consump;on – Failure to sa;sfy the comfort spec

  15. Can We Just Shut‐Down The Thermostat During “don’t‐care” Period? • Effec;ve way to save energy – Indoor temp. closer to outdoor heat dissipa;on slows down • Simula;ng it… Therefore, people frequently prefer to leave the thermostat on all day However, a smarter shut‐ down should s;ll be able to save energy while maintaining comfort • In this case, the result is: – Increased energy consump;on – Failure to sa;sfy the comfort spec

  16. From the US Dept. of Energy’s website

  17. Challenges Desired behavior: – Maximize shut‐down ;me while staying above the heat‐pump slope – Similarly for cooling (no AUX) Challenges: The heat‐pump slope: • – Is unknown in advance – Changes every day – Depends on future weather – Depends on specific house characteris;cs Ac;on effects are: • – Drifing rather than constant: since heat is being moved rather than generated, heat output strongly depends on the temperatures indoors, outdoors and along the heat path – Noisy due to hidden physical condi;ons – Delayed due to heat capacitors like walls and furniture Also, in a realis;c deployment: • – Explora;on cannot be too long or too aggressive – Customer acceptance will probably depend on worst‐case behavior Making decisions in con;nuous, high dimensional space •

  18. Our Problem as a Markov Decision Process (MDP) • States: • Ac?ons: • Transi?on: • Reward: • Terminal States: • Ac;on is taken every 6 minutes – Modeling a realis;c lockout of the system

  19. Our Problem as a Markov Decision Process (MDP) • States: • Ac?ons: {COOL, OFF, HEAT, AUX} 1 : 0 : 2 : 4 consump;on (e a ) propor;on • Transi?on: • Reward: • Terminal States: • Ac;on is taken every 6 minutes – Modeling a realis;c lockout of the system

  20. Our Problem as a Markov Decision Process (MDP) • States: • Ac?ons: {COOL, OFF, HEAT, AUX} 1 : 0 : 2 : 4 consump;on (e a ) propor;on • Transi?on: • Reward: – e a – 100000 Δ 2 6pm where: Δ 2 6pm := (indoor_temp_at_6pm – required_indoor_temp_at_6pm) • Terminal States: • Ac;on is taken every 6 minutes – Modeling a realis;c lockout of the system

  21. Our Problem as a Markov Decision Process (MDP) • States: ??? • Ac?ons: {COOL, OFF, HEAT, AUX} 1 : 0 : 2 : 4 consump;on (e a ) propor;on • Transi?on: • Reward: – e a – 100000 Δ 2 6pm where: Δ 2 6pm := (indoor_temp_at_6pm – required_indoor_temp_at_6pm) • Terminal States: • Ac;on is taken every 6 minutes – Modeling a realis;c lockout of the system

  22. How Should We Model State? • Choosing a state representa;on is an important design decision. A state variable: – captures what we need to know about the system at a given moment – is the variable around which we construct value func;on approxima;ons [Powell 2011] • Defini;on 5.4.1 from [Powell 2011]: – A state variable is the minimally dimensioned func;on of history that is necessary and sufficient to compute the decision func;on, the transi;on func;on, and the contribu;on func;on.

  23. Our Problem as a Markov Decision Process (MDP) • States: <T in , Time, e a > • Ac?ons: {COOL, OFF, HEAT, AUX} 1 : 0 : 2 : 4 consump;on (e a ) propor;on • Transi?on: • Reward: – e a – 100000 Δ 2 6pm where: Δ 2 6pm := (indoor_temp_at_6pm – required_indoor_temp_at_6pm) • Terminal States: • Ac;on is taken every 6 minutes – Modeling a realis;c lockout of the system

  24. Our Problem as a Markov Decision Process (MDP) • States: <T in , Time, e a > • Ac?ons: {COOL, OFF, HEAT, AUX} 1 : 0 : 2 : 4 consump;on (e a ) propor;on • Transi?on: • Reward: – e a – 100000 Δ 2 6pm where: Δ 2 6pm := (indoor_temp_at_6pm – required_indoor_temp_at_6pm) • Terminal States: • Ac;on is taken every 6 minutes – Modeling a realis;c lockout of the system

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend