1
A Distributed Approach to Large Scale Security Constrained Unit - - PowerPoint PPT Presentation
A Distributed Approach to Large Scale Security Constrained Unit - - PowerPoint PPT Presentation
A Distributed Approach to Large Scale Security Constrained Unit Commitment Problem Kaan Egilmez Cambridge Energy Solutions FERC Technical Conference on Increasing Real-Time and Day-Ahead Market Efficiency through Improved Software June
2
About CES
- Cambridge Energy Solutions is a software company with
a mission to develop software tools for participants in deregulated electric power markets.
- CES-US provides information and tools to assist market
participants in analyzing the electricity markets on a locational basis, forecast and value transmission congestion, and to understand the fundamental drivers of short- and long-term prices.
- CES-US staff are experts on market structures in the US,
system operation and related information technology.
3
Presentation overview
- The convergence of machine virtualization and the maturing of multi-core
computing has had a dramatic impact on the ease with which high performance computing techniques can be brought to bear on real world problems.
- At CES we are actively working on improving the performance of our DAYZER
market modeling and simulation software by making use of multi-core parallel programming on individual compute nodes combined with distribution of work load across multiple such compute nodes organized into high performance computing clusters.
- This talk provides an overview of the techniques we are using to
accomplish this goal as well as simulation results of performance improvement on both small and large scale models such as our combined model for PJM and MISO.
- These techniques if applied to market operations and planning would allow
many more scenarios to be concurrently examined and/or more detailed individual models to be solved within reasonable time limits allowing novel solutions to existing concerns regarding robustness of market results to various kinds of uncertainties.
4
DAYZER
CES has developed DAYZER to assist electric power market participants in analyzing the locational market clearing prices and the associated transmission congestion costs in competitive electricity markets. This tool simulates the operation of the electricity markets by mimicking the dispatch procedures used by the corresponding independent system operators (ISOs), and replicates the calculations made by the ISOs in solving for the security- constrained, least-cost unit commitment and dispatch in the Day-Ahead markets. Models are available for the CAISO, ERCOT, MISO, NEPOOL, NYISO, ONTARIO, PJM, SPP and WECC markets, as well as a combined model for the PJM-MISO region.
5
DAYZER SCUC MILP (MUC) Formulation
Minimize the total cost over 24 hours of: Generation + Startup/Shutdown + Imports/Exports + Generation Slacks + Spin Reserve Slacks + Non Spin Reserve Slacks + Transmission Overloads + PAR Angle Overloads Subject to the following constraints for each hour:
- System energy balance
- Spin reserves requirement
- Non spin reserves requirement
- Unit commitment constraints (capacity, min up/down, start/stop,
ramping)
- Pump storage constraints (efficiency, reservoir)
- Transmission constraints (line, contingency, interface, PAR, nomogram)
- PAR angle constraints
- DC line constraints
6
Examples of DAYZER Model Characteristics
NEPOOL (2014)
8 load zones 1 reserves pool 6 import/export interface units 2 pumped storage units 416 generation units (Nuclear, Hydro, Wind, Solar, CC, ST, GT) 2612 transmission constraints 11 PARs
PJM+MISO
combined
(2014)
54 load zones + 88 industrial load units 7 reserves pools 39 import/export interface units 8 pumped storage units 1972 generation units (Nuclear, Hydro, Wind, Solar, Battery, CC, ST, GT) 16161 transmission constraints 37 PARs 5 DC Lines
7
MUC Performance for NEPOOL Model
Machine A – 4 cores E3-1240 V2 CPU @ 3.4 GHz 32 GB memory Windows 8 server 64 bit Machine B – 8 cores i7-5960X CPU @ 3 GHz (over clocked at 3.87 GHz) 32 GB memory Windows 8.1 Pro 64 bit
Run Time statistics over 365 days in 2014
99th %tile Mean Max Min
Key
15 13 73 65 253 251 4 4 40 80 120 160 200 240 4 cores 8 cores
Seconds / Day
8
MUC Solution Quality for NEPOOL Model
3 10 20 48 107 176 1 40 80 120 160 200 < 0.01% 0.01% 0.02% 0.03% 0.04% 0.05% 0.21%
Duality gap at final solution (Target = 0.05%) Days Simulation over 365 days in 2014
9
MUC Performance for PJM+MISO Model
1829 1491 2642 2231 3027 3906 579 760 500 1000 1500 2000 2500 3000 3500 4000 4 cores 8 cores
Seconds / Day Run Time statistics over 90 days in Q1 2014
95th %tile Mean Max Min
Key The difference in run time performance is due to the faster CPU speed on the 8 core machine. A single MUC process cannot take advantage of multiple cores other than in incidental ways due to I/O and the presence of other workloads. These runs were performed with no
- ther non system tasks running
concurrently with MUC.
10
MUC Solution Quality for PJM+MISO Model
Days Duality gap at final solution (Target = 0.05%) Simulation over 90 days in Q1 2014
1 17 22 11 12 14 11 1 1 22 10 12 15 4 2 1 24
5 10 15 20 25
0.02% 0.05% 0.10% 0.15% 0.20% 0.25% 0.30% 0.35% 0.46% 0.58%
4 cores 8 cores
More MILP iterations were able to reach the target duality gap on the faster machine within the allowed maximum run time. The solver termination state (optimal vs. best found) differs for 18 days.
11
Typical MUC run time performance for a large model simulated for one year
500 1000 1500 2000 2500 3000 3500 1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365
Seconds Days
Splitting the simulation into months or quarters and running each segment in parallel is the conventional approach to taking advantage of multi-core
- machines. It’s clear from the above timing pattern that a finer grained load
balancing scheme can produce a much better overall run time performance.
12
Solution Architecture for Distributed And Parallel DAYZER
…
Master Workstation DAYZER Compute Nodes (Multi-core) MS MPI Interconnect over Private Network
- Simulation period load balanced across all cores at compute nodes using MPI.
- Results can be sent to a central database or stored in local partial databases.
- MPI based query tool allows locally stored results to be aggregated at Master.
- MUC: each day assigned to a core at a node using single threaded MILP SCUC.
- PUC: each day assigned to a multi-core node using Parallel SCUC.
13
DAYZER Parallel SCUC (PUC)
- Target duality gap estimated by solving an initial relaxation problem.
- Adaptive step size initialization and update heuristics incorporating the target gap
estimate as well as a measure of the current over/under commit.
- Early termination heuristics based on the target gap and step size update history.
- Unit sub problems modeled and solved as MILP (same as in the global version).
- Ramping constraints imposed on hourly dispatch using latest UC solutions.
- A unit (partial) decommitment
phase based on semi-global uplift minimization.
- Coverage of all transmission constraints by adaptively modifying
the dispatch LPs.
- Pump storage optimization handled by updating UC for a fixed PS solution, then
relaxing the associated PS constraints and updating their multipliers while UC is kept fixed. We then iterate over multiple cycles of this to achieve convergence.
- Losses and Contingency Analysis calculations interleaved with UC
iterations.
Solves the same problem as MUC but utilizes Lagrangian Relaxation Subgradient Optimization by decomposing the problem across time (hourly dispatch) as well as space (unit commitment). Some of the more distinctive aspects of our implementation are:
14
PUC performance on a small scale problem (NEPOOL) with Pump Storage Optimization
99th %tile Mean Max Min Key
Statistics from runs on two different machines over 365 days in 2014
Fuel Cost % Gap wrt MUC
4 cores 8 cores
Run time Seconds / Day
14 11 16 21 7 10 13 82 36 53 60 25 40 45 68 53 104 81 40 50 61 7 5 5 10 3 5 6 20 40 60 80 100 120 MIP 1 Cycle 2 Cycles 3 Cycles 1 Cycle 2 Cycles 3 Cycles 0.59% 0.43% 0.39% 2.92% 2.08% 2.04% 5.71% 2.81% 2.81%
- 0.63%
- 0.53%
- 0.53%
- 1%
0% 1% 2% 3% 4% 5% 6% 1 Cycle 2 Cycles 3 Cycles
15
Results from same runs without Pump Storage highlight the large impact of these resources
15 10 15 20 7 10 13 73 67 77 53 40 96 53 69 6 4 3 9 4 4 7 25 36 53 41 52 20 40 60 80 100 120
MUC
1 Cycle 2 Cycles 3 Cycles 1 Cycle 2 Cycles 3 Cycles 253
4 cores 8 cores
0.42% 0.38% 0.35% 2.44% 1.89% 1.60% 1.85% 2.83% 4.05%
- 0.59%
- 0.61%
- 0.61%
- 1%
0% 1% 2% 3% 4% 5% 1 Cycle 2 Cycles 3 Cycles
Run time Seconds / Day Fuel Cost % Gap wrt MUC
- The effective parallelization estimated from these runs is between 88% and 93%
which implies a speed up factor between 6 and 9 at 24 cores.
- Even without PS optimization PUC solution quality improves with additional cycles.
16
LMP comparison highlights the improvement gained from additional PUC cycles
MUC
50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500
PUC
1 Cycle 3 Cycles
NEPOOL daily load weighted average LMP (no PS)
RMS error: 1 cycle = 5.33 2 cycles = 4.41 3 cycles = 3.82 RMS error (w/PS): 1 cycle = 6.36 2 cycles = 5.18 3 cycles = 5.20
17
However, congestion pattern convergence may require even more cycles
Daily average of normalized hourly transmission rent =
h c c
h c Flow h c SP h c Flow ) , ( ) , ( ) , ( 24 1
1 2 3 4 5 6 1 2 3 4 5 6
1 cycle 2 cycles 3 cycles
MUC PUC
RMS error: 1 cycle = 0.29 2 cycles = 0.24 3 cycles = 0.26
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
18
PUC performance on a large scale problem (PJM+MISO combined) with Pump Storage Optimization
The above timing results imply an effective parallelization of almost 98% and hence a speed up factor of nearly 16 at 24 cores.
95th %tile Mean Max Min Key
Run time Seconds / Day Fuel Cost % Gap wrt MUC
0.28% 0.31% 0.30% 0.76% 0.86% 0.97% 1.37% 1.19% 1.44%
- 1.42%
- 1.39%
- 1.39%
- 1.5%
- 1.0%
- 0.5%
0.0% 0.5% 1.0% 1.5% 1 Cycle 2 Cycles 3 Cycles
4 cores 8 cores
1829 1097 1575 2159 594 830 1091 2642 1590 2204 3214 882 1156 1632
2112 1493 1097 4289 3906 2064 2915 481 361 247 881 760 425 626
500 1000 1500 2000 2500 3000 3500 4000 4500 MUC 1 Cycle 2 Cycles 3 Cycles 1 Cycle 2 Cycles 3 Cycles
19
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
1 cycle 3 cycles
MUC PUC
PJM + MISO daily load weighted average LMP
RMS error: 1 cycle = 1.84 2 cycles = 1.92 3 cycles = 1.84 RMS error for MUC LMP < 100: 1 cycle = 1.84 2 cycles = 1.87 3 cycles = 1.75
LMP comparison shows much smaller impact of additional PUC cycles compared to NEPOOL
20
Improvement in the alignment of congestion patterns beyond 2 cycles is not uniform
PUC MUC
Converging but additional cycles are required Converging to similar congestion pattern at a higher gap PUC finds better solution Lower cost solution diverging
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1 Cycle 2 Cycles 3 Cycles
Daily average of normalized hourly transmission rent
RMS error: 1 cycle = 0.0618 2 cycles = 0.0544 3 cycles = 0.0554 RMS error for MUC rent < 0.5: 1 cycle = 0.0389 2 cycles = 0.0337 3 cycles = 0.0358
21
However, additional cycles may have a benefit depending on how the results are used
Average load weighted LMP as a function of average normalized transmission rent (exponential fit)
20 40 60 80 100 120 140 160 180 0.0 0.2 0.4 0.6 0.8 1.0 1.2
PUC 1 cycle PUC 3 cycles MUC
The fit with PUC after 3 cycles is very close to the fit with MUC
22
Solutions close together in terms of fuel cost may still differ significantly in prices
- 6.00
- 4.00
- 2.00
0.00 2.00 4.00 6.00
- 0.10
- 0.05
0.00 0.05 0.10
Congestion difference (MUC – PUC) LMP difference Bubble width indicates % fuel cost gap wrt MUC solution for PUC after 3 cycles
23
Uplift solutions are comparable in most cases
Uplift as a percentage of generation revenue (PS excluded)
MUC Data not shown: Gap < 0% Gap ≥ 0.5% Gap ≥ 1% PUC
0.00% 0.50% 1.00% 1.50% 2.00% 2.50% 0.00% 0.50% 1.00% 1.50% 2.00% 2.50%
0.73 0.51 5.53 0.49 2.78 3.21
- 1.42
0.47 2.55 1.00 2.83 2.10 % Gap PUC MUC
24
Conclusions
- PUC overall solution quality comes close to that of MUC
and is probably acceptable for a range of applications where run time performance is more critical.
- In addition, a final MUC pass with constraints and initial
solution developed via a single cycle PUC can be used to improve both solution quality and run time performance for larger models.
- The combination of distributed and parallel techniques as