TacTex13: A Champion Adaptive Power Trading Agent Daniel Urieli - - PowerPoint PPT Presentation

tactex 13 a champion adaptive power trading agent
SMART_READER_LITE
LIVE PREVIEW

TacTex13: A Champion Adaptive Power Trading Agent Daniel Urieli - - PowerPoint PPT Presentation

TacTex13: A Champion Adaptive Power Trading Agent Daniel Urieli Peter Stone Department of Computer Science The University of Texas at Austin {urieli,pstone}@cs.utexas.edu AAAI 2014 Daniel Urieli, Peter Stone TacTex13: A Champion


slide-1
SLIDE 1

TacTex’13: A Champion Adaptive Power Trading Agent

Daniel Urieli Peter Stone

Department of Computer Science The University of Texas at Austin {urieli,pstone}@cs.utexas.edu

AAAI 2014

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 1

slide-2
SLIDE 2

The Smart Grid Vision

“Grid 2030” - vision for a smart-grid

Major challenge: aligning supply-demand in the presence of renewable, intermittent generation

AI: a main building block Smart-grid: new challenges for AI

[Ramchurn et. al 2012]

“GR

ID 2030”

A NATIONAL V

ISION FOR

E

LECTR ICITY’S S ECOND 100 YEAR S

July 2003 United States Departm ent of Energy Office of Electric Transm ission and Distribution Transform ing the Grid to R ev

  • lutionize Electric Power in North Am

erica Transform ing the Grid to R ev

  • lutionize Electric Power in North Am

erica

. .

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 2

slide-3
SLIDE 3

The Smart Grid Vision

“Grid 2030” - vision for a smart-grid

Major challenge: aligning supply-demand in the presence of renewable, intermittent generation

AI: a main building block Smart-grid: new challenges for AI

[Ramchurn et. al 2012]

“GR

ID 2030”

A NATIONAL V

ISION FOR

E

LECTR ICITY’S S ECOND 100 YEAR S

July 2003 United States Departm ent of Energy Office of Electric Transm ission and Distribution Transform ing the Grid to R ev

  • lutionize Electric Power in North Am

erica Transform ing the Grid to R ev

  • lutionize Electric Power in North Am

erica

. .

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 2

slide-4
SLIDE 4

The Smart Grid Vision

“Grid 2030” - vision for a smart-grid

Major challenge: aligning supply-demand in the presence of renewable, intermittent generation

AI: a main building block Smart-grid: new challenges for AI

[Ramchurn et. al 2012]

“GR

ID 2030”

A NATIONAL V

ISION FOR

E

LECTR ICITY’S S ECOND 100 YEAR S

July 2003 United States Departm ent of Energy Office of Electric Transm ission and Distribution Transform ing the Grid to R ev

  • lutionize Electric Power in North Am

erica Transform ing the Grid to R ev

  • lutionize Electric Power in North Am

erica

. .

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 2

slide-5
SLIDE 5

The Power Trading Agent Competition (Power TAC)

Grid 2030 milestone:

“Customer participation in power markets through demand-side management and distributed gener- ation”

Power TAC (Power Trading Agent Competition) Uses a rich smart grid simulation platform Focuses on retail power markets structure and operation Competitors: autonomous broker agents

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 3

slide-6
SLIDE 6

The Power Trading Agent Competition (Power TAC)

Grid 2030 milestone:

“Customer participation in power markets through demand-side management and distributed gener- ation”

Power TAC (Power Trading Agent Competition) Uses a rich smart grid simulation platform Focuses on retail power markets structure and operation Competitors: autonomous broker agents

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 3

slide-7
SLIDE 7

The Power Trading Agent Competition (Power TAC)

Grid 2030 milestone:

“Customer participation in power markets through demand-side management and distributed gener- ation”

Power TAC (Power Trading Agent Competition) Uses a rich smart grid simulation platform Focuses on retail power markets structure and operation Competitors: autonomous broker agents

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 3

slide-8
SLIDE 8

Approach

Application domain: autonomous energy-trading In this domain:

An agent is deployed into an unknown environment The agent is expected to make robust, real-time decisions Environment is realistic = ⇒ complex

To perform robustly, agent need to:

Learn Predict Plan Adapt

A natural approach: Reinforcement Learning

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 4

slide-9
SLIDE 9

Approach

Application domain: autonomous energy-trading In this domain:

An agent is deployed into an unknown environment The agent is expected to make robust, real-time decisions Environment is realistic = ⇒ complex

To perform robustly, agent need to:

Learn Predict Plan Adapt

A natural approach: Reinforcement Learning

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 4

slide-10
SLIDE 10

Approach

Application domain: autonomous energy-trading In this domain:

An agent is deployed into an unknown environment The agent is expected to make robust, real-time decisions Environment is realistic = ⇒ complex

To perform robustly, agent need to:

Learn Predict Plan Adapt

A natural approach: Reinforcement Learning

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 4

slide-11
SLIDE 11

Approach

Application domain: autonomous energy-trading In this domain:

An agent is deployed into an unknown environment The agent is expected to make robust, real-time decisions Environment is realistic = ⇒ complex

To perform robustly, agent need to:

Learn Predict Plan Adapt

A natural approach: Reinforcement Learning

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 4

slide-12
SLIDE 12

Approach

Application domain: autonomous energy-trading In this domain:

An agent is deployed into an unknown environment The agent is expected to make robust, real-time decisions Environment is realistic = ⇒ complex

To perform robustly, agent need to:

Learn Predict Plan Adapt

A natural approach: Reinforcement Learning

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 4

slide-13
SLIDE 13

Approach

Application domain: autonomous energy-trading In this domain:

An agent is deployed into an unknown environment The agent is expected to make robust, real-time decisions Environment is realistic = ⇒ complex

To perform robustly, agent need to:

Learn Predict Plan Adapt

A natural approach: Reinforcement Learning

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 4

slide-14
SLIDE 14

Reinforcement Learning in the Smart Grid

Reinforcement Learning (RL):

Agent Environment Action a State s, Reward r

Our domains require from an RL agent:

Sample-efficiency Computationally-efficiency Handle high-dimensional continuous state Handle continuous-actions and/or delayed-actions Handle possible non-stationarity

Combination that was not addressed by past RL algorithms

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 5

slide-15
SLIDE 15

Reinforcement Learning in the Smart Grid

Reinforcement Learning (RL):

Agent Environment Action a State s, Reward r

Our domains require from an RL agent:

Sample-efficiency Computationally-efficiency Handle high-dimensional continuous state Handle continuous-actions and/or delayed-actions Handle possible non-stationarity

Combination that was not addressed by past RL algorithms

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 5

slide-16
SLIDE 16

Reinforcement Learning in the Smart Grid

Reinforcement Learning (RL):

Agent Environment Action a State s, Reward r

Our domains require from an RL agent:

Sample-efficiency Computationally-efficiency Handle high-dimensional continuous state Handle continuous-actions and/or delayed-actions Handle possible non-stationarity

Combination that was not addressed by past RL algorithms

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 5

slide-17
SLIDE 17

Power TAC: Game Description

Balancing Market Wholesale Market T ariff Market

electricity generation companies renewables production commercial/residential consumers national grid competing broker agents

Electricity Grid

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 6

slide-18
SLIDE 18

Power TAC: Broker Operation Cycle

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 7

slide-19
SLIDE 19

Power TAC Game State

$ $ $

cash weather forecast day/time Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 8

slide-20
SLIDE 20

Power TAC 2013 Competition Results

Our agent, TACTEX’13, won the Power TAC 2013 finals:

Broker 7-broker 4-broker 2-broker Total (not normalized) TacTex

  • 705248

13493825 17853189 30641766 cwiBroker 647400 12197772 13476434 26321606 MLLBroker 8533 3305131 9482400 12796064 CrocodileAgent

  • 361939

1592764 7105236 8336061 AstonTAC 345300 5977354 5484780 11807435 Mertacor

  • 621040

1279380 4919087 5577427 INAOEBroker02

  • 76112159
  • 497131383
  • 70255037
  • 643498580

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 9

slide-21
SLIDE 21

TacTex’13: Approach TacTex’13: Approach

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 10

slide-22
SLIDE 22

TacTex’13: Approach

Balancing Market Wholesale Market T ariff Market

electricity generation companies renewables production commercial/residential consumers national grid competing broker agents

Electricity Grid

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 11

slide-23
SLIDE 23

TacTex’13: Approach

Wholesale Market T ariff Market

electricity generation companies commercial/residential consumers

Electricity Grid

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 12

slide-24
SLIDE 24

TacTex’13: Approach

Wholesale Market T ariff Market

electricity generation companies commercial/residential consumers T acT ex

Electricity Grid Wholesale Strategy: Buy Energy T ariff Strategy: Sell Energy

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 13

slide-25
SLIDE 25

TacTex’13: Approach

Wholesale Market T ariff Market

electricity generation companies commercial/residential consumers T acT ex

Electricity Grid

C1C2 Ci T1 T2 Tn

s11

T acT ex's tariffs customers future energy demand

s1n

+1 +2 +3

Wholesale Strategy: Buy Energy T ariff Strategy: Sell Energy

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 14

slide-26
SLIDE 26

TacTex’13: Approach

Wholesale Market T ariff Market

electricity generation companies commercial/residential consumers T acT ex

Electricity Grid

C1C2 Ci T1 T2 Tn

s11

T acT ex's tariffs customers future energy demand

s1n

+1 +2 +3

Wholesale Strategy: Buy Energy T ariff Strategy: Sell Energy

future energy su

✕ ✕ ✁ ✂ ✄osts

+1 +2 +3

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 15

slide-27
SLIDE 27

TacTex’13: Approach

Wholesale Market T ariff Market

electricity generation companies commercial/residential consumers T acT ex

Electricity Grid

C1C2 Ci T1 T2 Tn

s11

T acT ex's tariffs customers future energy demand

s1n

+1 +2 +3

Wholesale Strategy: Buy Energy T ariff Strategy: Sell Energy

future energy su

☎ ☎ ✆ ✝ ✞ ✟osts

+1 +2 +3

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 16

slide-28
SLIDE 28

TacTex’13: Tariff Market Strategy

Wholesale Market T ariff Market

electricity generation companies commercial/residential consumers T acT ex

Electricity Grid

C1C2 Ci T1 T2 Tn

s11

T acT ex's tariffs customers future energy demand

s1n

+1 +2 +3

Wholesale Strategy: Buy Energy T ariff Strategy: Sell Energy

future energy su

✠ ✠ ✡ ☛ ☞ ✌osts

+1 +2 +3

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 17

slide-29
SLIDE 29

TacTex’13: Tariff Market Strategy

Wholesale Market T ariff Market

electricity generation companies commercial/residential consumers T acT ex

Electricity Grid

C1C2 Ci T1 T2 Tn

s11

T acT ex's tariffs customers future energy demand

s1n

+1 +2 +3

Wholesale Strategy: Buy Energy T ariff Strategy: Sell Energy

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 17

slide-30
SLIDE 30

Decision Making in the Tariff Market

Available actions: tariff publications Tariff: contract for selling/buying energy

E.g.: [type=consumption, rates=(rate1, rate2,...), signup-fee=none,... ]

Rate: energy prices per time and/or quantity

Rate types: fixed, time-of-use (TOU), real-time (RT)...

Fixed: [fixed=true, price=7cent/kWh] TOU: [(time1=Mon-Fri 7am-6pm, price=8cent/kWh), (time2=Sat, ...), ...] RT: [expected/min/max-price=7/5/8 cent/kWh, rate-notice=3 hours...]

Customers subscribe to tariffs they find attractive

Cheap, minimizes inconvenience...

Challenge: what tariffs should a broker publish?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 18

slide-31
SLIDE 31

Decision Making in the Tariff Market

Available actions: tariff publications Tariff: contract for selling/buying energy

E.g.: [type=consumption, rates=(rate1, rate2,...), signup-fee=none,... ]

Rate: energy prices per time and/or quantity

Rate types: fixed, time-of-use (TOU), real-time (RT)...

Fixed: [fixed=true, price=7cent/kWh] TOU: [(time1=Mon-Fri 7am-6pm, price=8cent/kWh), (time2=Sat, ...), ...] RT: [expected/min/max-price=7/5/8 cent/kWh, rate-notice=3 hours...]

Customers subscribe to tariffs they find attractive

Cheap, minimizes inconvenience...

Challenge: what tariffs should a broker publish?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 18

slide-32
SLIDE 32

Decision Making in the Tariff Market

Available actions: tariff publications Tariff: contract for selling/buying energy

E.g.: [type=consumption, rates=(rate1, rate2,...), signup-fee=none,... ]

Rate: energy prices per time and/or quantity

Rate types: fixed, time-of-use (TOU), real-time (RT)...

Fixed: [fixed=true, price=7cent/kWh] TOU: [(time1=Mon-Fri 7am-6pm, price=8cent/kWh), (time2=Sat, ...), ...] RT: [expected/min/max-price=7/5/8 cent/kWh, rate-notice=3 hours...]

Customers subscribe to tariffs they find attractive

Cheap, minimizes inconvenience...

Challenge: what tariffs should a broker publish?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 18

slide-33
SLIDE 33

Decision Making in the Tariff Market

Available actions: tariff publications Tariff: contract for selling/buying energy

E.g.: [type=consumption, rates=(rate1, rate2,...), signup-fee=none,... ]

Rate: energy prices per time and/or quantity

Rate types: fixed, time-of-use (TOU), real-time (RT)...

Fixed: [fixed=true, price=7cent/kWh] TOU: [(time1=Mon-Fri 7am-6pm, price=8cent/kWh), (time2=Sat, ...), ...] RT: [expected/min/max-price=7/5/8 cent/kWh, rate-notice=3 hours...]

Customers subscribe to tariffs they find attractive

Cheap, minimizes inconvenience...

Challenge: what tariffs should a broker publish?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 18

slide-34
SLIDE 34

Decision Making in the Tariff Market

Available actions: tariff publications Tariff: contract for selling/buying energy

E.g.: [type=consumption, rates=(rate1, rate2,...), signup-fee=none,... ]

Rate: energy prices per time and/or quantity

Rate types: fixed, time-of-use (TOU), real-time (RT)...

Fixed: [fixed=true, price=7cent/kWh] TOU: [(time1=Mon-Fri 7am-6pm, price=8cent/kWh), (time2=Sat, ...), ...] RT: [expected/min/max-price=7/5/8 cent/kWh, rate-notice=3 hours...]

Customers subscribe to tariffs they find attractive

Cheap, minimizes inconvenience...

Challenge: what tariffs should a broker publish?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 18

slide-35
SLIDE 35

Decision Making in the Tariff Market

Available actions: tariff publications Tariff: contract for selling/buying energy

E.g.: [type=consumption, rates=(rate1, rate2,...), signup-fee=none,... ]

Rate: energy prices per time and/or quantity

Rate types: fixed, time-of-use (TOU), real-time (RT)...

Fixed: [fixed=true, price=7cent/kWh] TOU: [(time1=Mon-Fri 7am-6pm, price=8cent/kWh), (time2=Sat, ...), ...] RT: [expected/min/max-price=7/5/8 cent/kWh, rate-notice=3 hours...]

Customers subscribe to tariffs they find attractive

Cheap, minimizes inconvenience...

Challenge: what tariffs should a broker publish?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 18

slide-36
SLIDE 36

Decision Making in the Tariff Market

Available actions: tariff publications Tariff: contract for selling/buying energy

E.g.: [type=consumption, rates=(rate1, rate2,...), signup-fee=none,... ]

Rate: energy prices per time and/or quantity

Rate types: fixed, time-of-use (TOU), real-time (RT)...

Fixed: [fixed=true, price=7cent/kWh] TOU: [(time1=Mon-Fri 7am-6pm, price=8cent/kWh), (time2=Sat, ...), ...] RT: [expected/min/max-price=7/5/8 cent/kWh, rate-notice=3 hours...]

Customers subscribe to tariffs they find attractive

Cheap, minimizes inconvenience...

Challenge: what tariffs should a broker publish?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 18

slide-37
SLIDE 37

Decision Making in the Tariff Market

Available actions: tariff publications Tariff: contract for selling/buying energy

E.g.: [type=consumption, rates=(rate1, rate2,...), signup-fee=none,... ]

Rate: energy prices per time and/or quantity

Rate types: fixed, time-of-use (TOU), real-time (RT)...

Fixed: [fixed=true, price=7cent/kWh] TOU: [(time1=Mon-Fri 7am-6pm, price=8cent/kWh), (time2=Sat, ...), ...] RT: [expected/min/max-price=7/5/8 cent/kWh, rate-notice=3 hours...]

Customers subscribe to tariffs they find attractive

Cheap, minimizes inconvenience...

Challenge: what tariffs should a broker publish?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 18

slide-38
SLIDE 38

Decision Making in the Tariff Market

Available actions: tariff publications Tariff: contract for selling/buying energy

E.g.: [type=consumption, rates=(rate1, rate2,...), signup-fee=none,... ]

Rate: energy prices per time and/or quantity

Rate types: fixed, time-of-use (TOU), real-time (RT)...

Fixed: [fixed=true, price=7cent/kWh] TOU: [(time1=Mon-Fri 7am-6pm, price=8cent/kWh), (time2=Sat, ...), ...] RT: [expected/min/max-price=7/5/8 cent/kWh, rate-notice=3 hours...]

Customers subscribe to tariffs they find attractive

Cheap, minimizes inconvenience...

Challenge: what tariffs should a broker publish?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 18

slide-39
SLIDE 39

Decision Making in the Tariff Market

Available actions: tariff publications Tariff: contract for selling/buying energy

E.g.: [type=consumption, rates=(rate1, rate2,...), signup-fee=none,... ]

Rate: energy prices per time and/or quantity

Rate types: fixed, time-of-use (TOU), real-time (RT)...

Fixed: [fixed=true, price=7cent/kWh] TOU: [(time1=Mon-Fri 7am-6pm, price=8cent/kWh), (time2=Sat, ...), ...] RT: [expected/min/max-price=7/5/8 cent/kWh, rate-notice=3 hours...]

Customers subscribe to tariffs they find attractive

Cheap, minimizes inconvenience...

Challenge: what tariffs should a broker publish?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 18

slide-40
SLIDE 40

Decision Making in the Tariff Market

Available actions: tariff publications Tariff: contract for selling/buying energy

E.g.: [type=consumption, rates=(rate1, rate2,...), signup-fee=none,... ]

Rate: energy prices per time and/or quantity

Rate types: fixed, time-of-use (TOU), real-time (RT)...

Fixed: [fixed=true, price=7cent/kWh] TOU: [(time1=Mon-Fri 7am-6pm, price=8cent/kWh), (time2=Sat, ...), ...] RT: [expected/min/max-price=7/5/8 cent/kWh, rate-notice=3 hours...]

Customers subscribe to tariffs they find attractive

Cheap, minimizes inconvenience...

Challenge: what tariffs should a broker publish?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 18

slide-41
SLIDE 41

Tariff Market Approach

TACTEX uses a utility-based approach

Optimizes long-term utility (= profits)

Core computation: “if I publish tariff t, how would it affect my long-term utility?”

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 19

slide-42
SLIDE 42

Tariff Market Approach

TACTEX uses a utility-based approach

Optimizes long-term utility (= profits)

Core computation: “if I publish tariff t, how would it affect my long-term utility?”

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 19

slide-43
SLIDE 43

Tariff Utility Estimation

Considering only fixed-rate tariffs

More attractive to customers Optimizing one future price instead of a sequence

Estimate future customers demand Estimate future wholesale costs Select price that maximizes profits

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 20

slide-44
SLIDE 44

Tariff Utility Estimation

Considering only fixed-rate tariffs

More attractive to customers Optimizing one future price instead of a sequence

Estimate future customers demand Estimate future wholesale costs Select price that maximizes profits

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 20

slide-45
SLIDE 45

Tariff Utility Estimation

Considering only fixed-rate tariffs

More attractive to customers Optimizing one future price instead of a sequence

D+1 D+2 D+3 D+4 D+5 D+6 D+7 energy energy energy energy energy energy energy unit cost

Estimate future customers demand Estimate future wholesale costs Select price that maximizes profits

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 20

slide-46
SLIDE 46

Tariff Utility Estimation

Considering only fixed-rate tariffs

More attractive to customers Optimizing one future price instead of a sequence

C+1 C+2 C+3 C+4 C+5 C+6 C+7 D+1 D+2 D+3 D+4 D+5 D+6 D+7 energy energy energy energy energy energy energy unit cost

Estimate future customers demand Estimate future wholesale costs Select price that maximizes profits

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 20

slide-47
SLIDE 47

Tariff Utility Estimation

Considering only fixed-rate tariffs

More attractive to customers Optimizing one future price instead of a sequence

C+1 C+2 C+3 C+4 C+5 C+6 C+7 D+1 D+2 D+3 D+4 D+5 D+6 D+7 energy energy energy energy energy energy energy unit cost

Estimate future customers demand Estimate future wholesale costs Select price that maximizes profits Publish tariff if expected to increase utility

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 20

slide-48
SLIDE 48

Tariff Utility Estimation

Considering only fixed-rate tariffs

More attractive to customers Optimizing one future price instead of a sequence

C+1 C+2 C+3 C+4 C+5 C+6 C+7 D+1 D+2 D+3 D+4 D+5 D+6 D+7 energy energy energy energy energy energy energy unit cost

Estimate future customers demand Estimate future wholesale costs Select price that maximizes profits Publish tariff if expected to increase utility

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 20

slide-49
SLIDE 49

TacTex’13: Wholesale Market Strategy

Wholesale Market T ariff Market

electricity generation companies commercial/residential consumers T acT ex

Electricity Grid

✜ ✢ ✜ ✣ ✜i

T

✢ T ✣

Tn

s

✍ ✍

T acT e

✙✎ ✏ tariffs

customers

✥ ✑✒ure energy

demand

s1n

+1 +2 +3

Wholesale Strategy: Buy Energy T ariff Strategy: Sell Energy

future energy su

✓ ✓ ✔ ✖ ✗ ✘osts

+1 +2 +3

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 21

slide-50
SLIDE 50

TacTex’13: Wholesale Market Strategy

Wholesale Market T ariff Market

electricity generation companies commercial/residential consumers T acT ex

Electricity Grid Wholesale Strategy: Buy Energy T ariff Strategy: Sell Energy

✤ ✚✛ure energy

su

✦ ✦ ✧ ★ ✩ ✪osts ✫ ✬ ✫ ✭ ✫ ✮

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 21

slide-51
SLIDE 51

Decision Making in the Wholesale Market

Available actions: bid submissions

Bid: [needed-amount=2mWh, limit=25$/mWh, when=5pm]

Bids cleared in a double auction: Day ahead market = ⇒ 24 auctions for each timeslot Need to:

Buy energy cheaply Avoid imbalance costs = ⇒ buy all needed energy

Challenge: what bidding strategy to use?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 22

slide-52
SLIDE 52

Decision Making in the Wholesale Market

Available actions: bid submissions

Bid: [needed-amount=2mWh, limit=25$/mWh, when=5pm]

Bids cleared in a double auction: Day ahead market = ⇒ 24 auctions for each timeslot Need to:

Buy energy cheaply Avoid imbalance costs = ⇒ buy all needed energy

Challenge: what bidding strategy to use?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 22

slide-53
SLIDE 53

Decision Making in the Wholesale Market

Available actions: bid submissions

Bid: [needed-amount=2mWh, limit=25$/mWh, when=5pm]

Bids cleared in a double auction: Day ahead market = ⇒ 24 auctions for each timeslot Need to:

Buy energy cheaply Avoid imbalance costs = ⇒ buy all needed energy

Challenge: what bidding strategy to use?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 22

slide-54
SLIDE 54

Decision Making in the Wholesale Market

Available actions: bid submissions

Bid: [needed-amount=2mWh, limit=25$/mWh, when=5pm]

Bids cleared in a double auction: Day ahead market = ⇒ 24 auctions for each timeslot Need to:

Buy energy cheaply Avoid imbalance costs = ⇒ buy all needed energy

Challenge: what bidding strategy to use?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 22

slide-55
SLIDE 55

Decision Making in the Wholesale Market

Available actions: bid submissions

Bid: [needed-amount=2mWh, limit=25$/mWh, when=5pm]

Bids cleared in a double auction: Day ahead market = ⇒ 24 auctions for each timeslot Need to:

Buy energy cheaply Avoid imbalance costs = ⇒ buy all needed energy

Challenge: what bidding strategy to use?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 22

slide-56
SLIDE 56

Decision Making in the Wholesale Market

Available actions: bid submissions

Bid: [needed-amount=2mWh, limit=25$/mWh, when=5pm]

Bids cleared in a double auction: Day ahead market = ⇒ 24 auctions for each timeslot Need to:

Buy energy cheaply Avoid imbalance costs = ⇒ buy all needed energy

Challenge: what bidding strategy to use?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 22

slide-57
SLIDE 57

Wholesale Market Strategy

Per timeslot: estimate future demand Minimize cost for satisfying this demand Online RL bidding algorithm:

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 23

slide-58
SLIDE 58

Wholesale Market Strategy

Per timeslot: estimate future demand Minimize cost for satisfying this demand Online RL bidding algorithm:

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 23

slide-59
SLIDE 59

Wholesale Market Strategy

Per timeslot: estimate future demand Minimize cost for satisfying this demand Online RL bidding algorithm:

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 23

slide-60
SLIDE 60

Wholesale Market Strategy

Per timeslot: estimate future demand Minimize cost for satisfying this demand Online RL bidding algorithm:

Unit cost distribution

  • f past successful bids

r=0 r=balancing-price

  • terminal-state

1 2 24

success

r=limit-price

MDP States: {0, 1, . . . , 24, success} MDP Actions: limit-price ∈ R

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 23

slide-61
SLIDE 61

Controlled Experiments - Ablation Analysis

Round-Robin 2-agent tournament between:

B: baseline agent U1: adding tariff-market strategy U9_MDP: adding wholesale-market strategy U9_MDP_LWR: adding LWR customer prediction

Each pair played 200 games with similar conditions B U1 U9_MDP U9_MDP_LWR 1278.3 (43.2) 708.9 (35.6) 34.2 (23.2) U9_MDP 966.4 (40.5) 592.6 (22.2) U1 547.4 (27.7))

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 24

slide-62
SLIDE 62

Ablation Analysis Using Available Finalist Agents

4-agent games using 3 available finalist agents

Broker Cash cwiBroker 340.9 (8.4) Mertacor

  • 276.2 (40.2)

CrocodileAgent

  • 287.1 (14.5)

B

  • 334.6 (8.0)

Broker Cash cwiBroker 315.4 (9.3) U1 135.3 (12.3) CrocodileAgent

  • 372.1 (17.0)

Mertacor

  • 485.5 (28.1)

Broker Cash U9_MDP 389.9 (13.3) cwiBroker 138.3 (8.7) CrocodileAgent

  • 333.3 (17.0)

Mertacor

  • 494.1 (29.6)

Broker Cash U9_MDP_LWR 350.8 (13.3) cwiBroker 132.4 (9.0) CrocodileAgent

  • 336.9 (17.3)

Mertacor

  • 566.1 (26.8)

Tariff and Wholesale strategies improve performance LWR customer prediction reduces performance

Should relax LWR’s extrapolation assumptions?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 25

slide-63
SLIDE 63

Ablation Analysis Using Available Finalist Agents

4-agent games using 3 available finalist agents

Broker Cash cwiBroker 340.9 (8.4) Mertacor

  • 276.2 (40.2)

CrocodileAgent

  • 287.1 (14.5)

B

  • 334.6 (8.0)

Broker Cash cwiBroker 315.4 (9.3) U1 135.3 (12.3) CrocodileAgent

  • 372.1 (17.0)

Mertacor

  • 485.5 (28.1)

Broker Cash U9_MDP 389.9 (13.3) cwiBroker 138.3 (8.7) CrocodileAgent

  • 333.3 (17.0)

Mertacor

  • 494.1 (29.6)

Broker Cash U9_MDP_LWR 350.8 (13.3) cwiBroker 132.4 (9.0) CrocodileAgent

  • 336.9 (17.3)

Mertacor

  • 566.1 (26.8)

Tariff and Wholesale strategies improve performance LWR customer prediction reduces performance

Should relax LWR’s extrapolation assumptions?

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 25

slide-64
SLIDE 64

Related Work: Power Trading Agents

RL for tariff publications [Peters-2013]

Offline preference learning

Market Bidding MDP [Kuate-2013]

Uses a different MDP representation

Tariff Publication MDP [Reddy-2011]

More restrictive setup

The Power TAC Platform and Competition [Ketter-2013]

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 26

slide-65
SLIDE 65

Summary

TacTex’13: utility-optimizing broker agent Interdependent optimization problems

Utility-maximizing tariff strategy:

C+1 C+2 C+3 C+4 C+5 C+6 C+7 D+1 D+2 D+3 D+4 D+5 D+6 D+7 energy energy energy energy energy energy energy unit cost

Online reinforcement learning bidding algorithm:

24 parallel MDPs Time Unit cost distribution

  • f past successful bids
r=0 r=balancing-price
  • terminal-state
1 2 24

success

r=limit-price

Outlook

Investigating other tariff, wholesale and balancing strategies Impact on the smart grid and customer behaviors

Daniel Urieli, Peter Stone TacTex’13: A Champion Adaptive Power Trading Agent 27