L2RPN Challenge - Learning to Run a Power Network through AI Di Shi - - PowerPoint PPT Presentation

l2rpn challenge
SMART_READER_LITE
LIVE PREVIEW

L2RPN Challenge - Learning to Run a Power Network through AI Di Shi - - PowerPoint PPT Presentation

L2RPN Challenge - Learning to Run a Power Network through AI Di Shi Team: Tu Lan, Jiajun Duan, Bei Zhang, Zhiwei Wang, Xiaohu Zhang, Ruisheng Diao, Yan Zan AI & System Analytics GEIRI North America (GEIRINA) @PSERC Summer Workshop July


slide-1
SLIDE 1

L2RPN Challenge

  • Learning to Run a Power Network through AI

Di Shi

Team: Tu Lan, Jiajun Duan, Bei Zhang, Zhiwei Wang, Xiaohu Zhang, Ruisheng Diao, Yan Zan AI & System Analytics GEIRI North America (GEIRINA) @PSERC Summer Workshop July 16, 2019

slide-2
SLIDE 2

Meaning of Different Terms: AI, ML, DL

2

Source: Nvidia

A process where a computer solves a task in a way that mimics human behavior. Generalized AI vs. Applied AI. A subset of ML and refers to artificial neural networks composed of many layers. A subset of AI and refers to algorithms that parse data, learn from them, and then apply what they’ve learnt to make intelligent decisions.

Intro of Artificial Intelligence

Credit: Nvidia

slide-3
SLIDE 3

Milestones of AI Development

3

AlphaStar defeated top human players in Star Craft II

Self-driving cars

2016 - present

Biometrics recognition

Intro of Artificial Intelligence

Source: https://www.pinterest.com/pin/786792997375069862/?lp=true

slide-4
SLIDE 4

AI Categories and Applications

4

Intro of Artificial Intelligence

*Source: https://towardsdatascience.com/machine-learning-for-biginners-d247a9420dab

slide-5
SLIDE 5

5

Supervised Learning

In Out error target

labeled data

Application

 Classification  Predict a target numeric value

Common Algorithms

  • k-Nearest Neighbors
  • Linear Regression
  • Decision Trees
  • Naïve Bayes
  • SVM
  • Neural Networks

Unsupervised Learning

unlabeled data

Application

 Clustering  Visualization  Dimensionality reduction  Anomaly detection

Common Algorithms

  • k-Means
  • Hierarchical Cluster

Analysis

  • Principal Component

Analysis

In Out In Out reward & state environment

Reinforcement Learning

Application

 DeepMind’s AlphaGo  Fire-extinguish robots  Grid Mind

Common Algorithms

  • Dynamic programming
  • Monte Carlo
  • Temporal Difference

(TD)  Q-Learning  SARSA

Semi-supervised Learning

many unlabeled & few labeled data

Application

 Google Photos  Webpage classification

Common Algorithms

  • Combination of

unsupervised and supervised learning

In Out

Summary of Key AI Technologies

Deep Learning is an extension

  • f supervised, unsupervised

and semi-supervised learning using many layers

DRL= DL + RL

slide-6
SLIDE 6
  • Intelligent monitoring & early warning
  • Intelligent diagnosis of equipment
  • Image recognition of power lines
  • Situational awareness

6

  • Model validation and calibration
  • Excitation and damping control
  • Maintenance Scheduling
  • Renewable Forecasting

Trend of AI in Power Grids

  • Power system operation and control
  • Power system asset management
  • Power system mid/long term

planning

  • Power system economics and market

Applications of AI in Power Systems

Generation Transmission Distribution End user

  • Knowledge map & intelligent reasoning
  • Fault detection and location
  • Intelligent analysis and self-healing ctrl
  • Demand forecasting
  • Load clustering and par. identification

Potential Applications

Monitoring Diagnosis Forecasting Reasoning/planning Decision making Autonomous control

  • RNN
  • CNN
  • GNN
  • LSTM
  • GAN
  • SVM…
  • (D)DQN
  • DDPG
  • A3C
  • PPO
  • SAC
  • TRPO…

GEIRINA’s R&D Focus!

slide-7
SLIDE 7

Outline

  • L2RPN Challenge
  • Early Stage Attempts
  • Proposed Methodology – the winning algorithm
  • Imitation Learning and DRL
  • Training Methods and Adaptive Adjustments
  • Results

7

slide-8
SLIDE 8

About the Competition

  • https://l2rpn.chalearn.org/
  • https://competitions.codalab.org/competitions/22845.

8

Timeline for the Competition

May 15th, 2019: Beginning of the competition with the release of public RL environment. Participants can start submitting agent models

  • n Codalab platform and obtaining immediate

feedback in the leaderboard on validation scenarios. May 27th, 2019: Potential release of a new baseline to foster competition if several participants are already doing better than this baseline. June 15th, 2019: Start of the testing days on unseen test scenarios. June 19th, 2019: End of the competition, beginning of the post-competition process Jul 1rst, 2019: Announcement of the L2RPN Winners. Jul 14th, 2019: Beginning of IJCNN 2019.

This was later extended to Jun. 23rd, 2019

slide-9
SLIDE 9

Problem Statement

  • Run the power network through topology control

Why should we care?

  • Rising complexity of

the power grid

  • Integration of

renewables

  • AC + DC loads

  • Costly to build new

lines

Q: How to alleviate the burden through the topology control?

9

1 2 3 4 5 6 7 8 9 10 11 12 13 14

slide-10
SLIDE 10

Problem Analysis - System

996.8 A 399.9 A 428.4 A 374.4 A 221 A 447.1 A 301.9 A 123 A 100 A 208.9 A 390.5 A 353.7 A 211.8 A 175.1 A 161.6 A 100 A 155.3 A 315.5 A 150 A 241 A

Slack Bus

System Summary

  • 14 Buses
  • 5 Generators
  • 11 Loads
  • 20 Lines (thermal

limits as indicated)

  • The system is successively running

at an interval of every 5 min

  • Training: thousands of scenarios

considered; around 1-month time series data in each scenario

  • Official Test: 10 scenarios; data of

1-3 days in each scenario

slide-11
SLIDE 11

Problem Analysis - Objective

11

  • Analyze the problem in the framework of an optimization problem
  • bj. Min/Max (Objective)

s.t. Constraint_1 Constraint_2 Constraint_3 …

  • Decision Variable
  • Parameters

Optimization problem: Maximize the remaining power transfer capability over all time steps of all scenarios

Transfer Capability at a Time Step: Transfer Capability at a Scenario: Transfer Capability of All Scenarios: *Note: Game

  • ver when

certain constraint is violated * 1st day 2nd day nth day Scenario 1: 1st day 2nd day nth day Scenario 2: 1st day 2nd day nth day Scenario n:

slide-12
SLIDE 12

Problem Analysis – Decision Variables

1st day 2nd day nth day Scenario 1: 1st day 2nd day nth day Scenario 2: 1st day 2nd day nth day Scenario n:

  • Decision Variables: what can we control to maximize the power transfer

capability?

Line Switching (20 lines) Node Splitting (156 for 14 nodes)

Bus1 Bus2 Bus1 Bus2 Bus1-2 Bus2 Bus1-1 Bus1-2 Bus2 Bus1-1 Bus1-2 Bus2 Bus1-1

e.g.

  • Topology of the network at all time steps in all scenarios

+

*Note: A Maximum of 1 action at the node + 1 action at a line per timestep is allowed

12

Totally 3120 (156*20) topologies!

slide-13
SLIDE 13

Problem Analysis – Hard Constraints

  • Game Over if any of the hard constraints is violated:
  • Load should be met over all time steps of all scenarios
  • No more than 1 power plants get disconnected over all time steps of all

scenarios

  • The grid should not get split apart into isolated sub-grids over all time

steps of all scenarios

  • AC power flow solution should converge over all time steps of all

scenarios

13

1st day 2nd day nth day Scenario 1: 1st day 2nd day nth day Scenario 2: 1st day 2nd day nth day Scenario n:

slide-14
SLIDE 14

Problem Analysis – Soft Constraints

  • Violation on soft constraints may lead to certain consequences though not

immediate “game over”:

  • Line overload should be controlled over all time steps of all scenarios :
  • Cooldown should be considered: 3 steps of cool down is required before a line or

node can be reused, the violation on this will cause: 1) step score to be 0; 2) the action will not be taken, resulting in no action.

Scenario Consequence Time Steps to Recover Line Flow >= 150% Line immediately broken and disconnected 10 100% < Line Flow < 150% Wait for 2 more timestep to see whether the

  • verflow is resolved; If not, line gets disconnected

3

14

1st day 2nd day nth day Scenario 1: 1st day 2nd day nth day Scenario 2: 1st day 2nd day nth day Scenario n:

slide-15
SLIDE 15

Problem Analysis – Parameters

Load Profile Gen Profile Maintenance Profile Fault Profile Voltage Profile

Interval: Every 5 min!

2018-01-01 2018-01-02

……

15

Not considered in the competition, for future extension

slide-16
SLIDE 16

Problem Analysis - Summary

16

  • Hard problem to solve within the conventional optimization framework: 1) hard to

solve the mixed-integer nonlinear dynamic optimization (AC power flow); 2) so many hard and soft constraints; 3) hard to mathematically model those dynamic constraints; 4) huge scale due to the consideration of long continuous timesteps

  • We may refer to DRL-based method but still with difficulties:
  • Selection on the action space (huge action space leads to difficulties in convergence),

which will be further explained later

  • Long continuous timestep: the desired DRL agent should be able to operate the system in

hundreds and thousands of timesteps

  • Many hard and soft constraints: this complicates the problem, and greatly increases the

difficulty in training the agent.

slide-17
SLIDE 17

Problem Formulation-If Using Traditional Optimization Approaches

17

  • As illustrated by the figure at the bottom, a large number of binary variable

should be introduced to represent the connection status of each component, e.g., generator, load, line, etc.

  • The objective is to maximize the system available transmission capacity, an

auxiliary variable is introduced:

Generalized model for network topology change

k

max

𝑙∈𝛻𝑙

𝜇𝑙

2 max

0, 1 ( ) ,

k k k k

k S k S       

k

 is constrained by

slide-18
SLIDE 18

18

1 2 1 2

(1 ) (1 ), (1 ) (1 ),

i i i i V V i i i i

M z M z i M z V V M z i

 

               

  • Constraints

,0 1 ,0 2 1 2 1 2

(1 ) , / , / (1 ) (1 ) , , (1 ) (1 ) , ,

g g n n n sl g g n n n sl P g P n n n sl P g P n n n sl Q g Q n n n Q g Q n n n

P z P n P z P n z M P z M n z M P z M n z M Q z M n z M Q z M n                               

1 2

(1 ) , ,

d d m m m d d m m m

P z P m P z P m       

( ) ( ) ( ) 1 ( ) ( ) ( ) 2 ( ) ( ) ( ) 1 ( ) ( ) ( ) 2

(1 ) (1 ) , , (1 ) (1 ) , ,

i j l i j i j l k k k i j l i j i j l k k k i j l i j i j l k k k i j l i j i j l k k k

z M P z M k z M P z M k z M Q z M k z M Q z M k                    

( ) 1 ( ) 1 ( )

, , ,

l i j l k k k l i j l k k k i j k k

z M P z M k z M Q z M k z z k          

Voltage magnitude and angle at two busbars in a substation A generator can be placed at either busbar of a substation A load can be placed at either busbar of a substation The end of a transmission line can be placed at either busbar of a substation

Problem Formulation-If Using Traditional Optimization Approaches

slide-19
SLIDE 19

19

  • Constraints

Active and reactive power at each end of a transmission line k The ‘real’ voltage magnitude and angle at each end of a transmission line The power flow on a transmission line considering transmission switching Constraints for apparent power on transmission line k

( ) ( ) ( ) 1 2 ( ) ( ) ( ) 1 2

, ,

i j i j i j k k k i j i j i j k k k

P P P k Q Q Q k      

( ) ( ) ( ) ( ) 1 ( ) ( ) ( ) ( ) 2 ( ) ( ) ( ) ( ) 1 ( ) ( ) ( ) ( ) 2

, (1 ) (1 ) , , (1 ) (1 ) ,

i j V i j i j i j V k k k k i j V i j i j i j V k k k k i j i j i j i j k k k k i j i j i j i j k k k k

z M V V z M k z M V V z M k z M z M k z M z M k

   

                           

2 2 2

(1 ) ( ) ( cos( ) sin( )) (1 ) , (1 ) ( ) ( ) ( sin( ) cos( ) ) (1 ) , (1 ) ( ) ( cos(

l i i j i j k k k k k k k k i j i l k k k k k l i i j i j k k k k k k k k k i j i l k k k k k l j j i j i k k k k k k k k

z M g V V V g b P z M k z M V b b V V g b Q z M k z M g V V V g                                       

2

) sin( )) (1 ) , (1 ) ( ) ( ) ( sin( ) cos( ) ) (1 ) ,

j i j l k k k k k l j j i j i k k k k k k k k k j i j l k k k k k

b P z M k z M V b b V V g b Q z M k                         

2 2 2 2

( ) ( ) , ( ) ( ) ,

i i i k k k j j j k k k

S P Q k S P Q k       , ,

i k k j k k

S S k S S k    

Problem Formulation-If Using Traditional Optimization Approaches

slide-20
SLIDE 20

20

  • Constraints

Power balance at each busbar If the substation does not split, all the components within it should remain at the same busbar. Limits for the number of buses that can split and the number of lines that can be switched.

1 1 1 1 ( ) ( ) 1 1 1 1 ( ) ( ) 2 2 2 2 ( ) ( ) 2 2 2 2 ( ) ( )

0, 0, 0, 0,

i i i i i i i i

g d i i n m k k n m k f i k t i g d i i n m k k n m k f i k t i g d i i n m k k n m k f i k t i g d i i n m k k n m k f i k t i

P P P P i Q Q Q Q i P P P P i Q Q Q Q i

               

                   

               

1, , 1, , 1, , ( ) ( )

i n i i m i i i k

z z i n z z i m z z i k f i t i              (1 ) 1

i i

z

 

(1 ) 1

k

k k

z



 

  • The problem in a snapshot is already a large scale nonconvex mixed integer

nonlinear programming, which is difficult to solve using commercial solver.

  • If the different time steps are considered, the computational burden will be

dramatically increased due to the significant increase of the number of

  • ptimization variables and the additional time coupling constraints.
  • Therefore, using traditional optimization approach is not an option for this

competition.

Problem Formulation-If Using Traditional Optimization Approaches

slide-21
SLIDE 21

Early Stage Attempts

21

  • Control Problem with Time Series Data
  • Deep Reinforcement Learning
  • Discrete Action
  • Value-based or policy-based methods

1st Attempt Failed! (Curse of Dimention) 3 Different Ideas

  • Use imitation learning to pre-train the model to obtain the initial

weights (3120).

  • Use value-based method (DQN). Ignore the cut line actions and
  • nly consider node splitting actions (156).
  • Use policy-based methods, such as A2C, PPO.
  • DRL using DQN with an action space of 3120
slide-22
SLIDE 22

Early Stage Attempts

22

Pros Cons

Method 1 (Imitation Learning) The pretrained Q-value distribution does reflect the action effectiveness. The action space is still too big even for the imitation learning. Method 2 (DDQN of substation act.) The reduced action space is enough to solve most of scenarios. The score is not high enough due to the limitation of action space, and the training time is quite long. Method 3 (PPO) All feasible action combinations are properly considered. The convergence is a problem due to the large action space.

Direction

  • Combine DRL and imitation learning (supervised)
  • Reduced action space (156 + 20 + …)
  • Value-based method (DDQN)
  • Other methods (power system and AI knowledges)
slide-23
SLIDE 23

Proposed Methodology

23

  • Imitation Learning (Supervised) and DRL with Dueling DQN
slide-24
SLIDE 24

Imitation Learning

24

  • Use supervised learning to generate initial weights for the neural network
  • Significantly decrease the DRL training time needed
  • Good initial weights avoid falling to bad local optimum
  • Similar input but distinct output
  • Close input vector of similar power grid state values
  • Complex output with many spikes
  • Loss function
  • MSE
  • Weighted SE
  • Huber Loss
slide-25
SLIDE 25

Imitation Learning

  • Generate 40,000 samples of training & validation data (state and score(s, a))
  • Sample prediction and label

25

slide-26
SLIDE 26

DRL with Dueling DQN

26

slide-27
SLIDE 27

DRL with Dueling DQN

  • Model Structure — Vector Input
  • Model Structure — 3-D Matrix (28, 28, 1) Input

27

slide-28
SLIDE 28

Training Methods and Process

  • Training methods
  • Guided exploration instead of traditional random epsilon-greedy
  • Weighted memory and importance sampling (TD - error)
  • Skip-step training (update model weights)
  • Store multiple copies of game-over samples in memory
  • Consider multiple constraints
  • Action validation: cooldown, power grid island, and other competition requirements
  • Action causing line overflow
  • For above cases: set reward to -1 (guide agent to learn the constraints)

28

slide-29
SLIDE 29

Training Hyperparameters

  • Train on multi-core CPUs
  • Imitation Learning: Parallel workers data generation (1 day) and training (4 – 6 hours)
  • DRL Training: Train 100,000 – 500,000 steps (4 – 20 hours)
  • List of key hyperparameters and values

29

DRL Value Learning rate 1e-4 / 1e-3 Gamma 0.99 Replace target per step 128 / 256 Replay memory size 256 / 512 / 1024 PER alpha 0.6 PER beta 0.4 Batch size 4 / 8 / 32 Imitation Learning Value Learning rate 1e-2 / 3e-2 Batch size 1 / 4 / 8 Episodes 1000 Advantage FC #neurons 64 / 128 / 256 Loss weight factor 0.5 / 0.7 / 0.9

slide-30
SLIDE 30

Model Testing Adaptive Adjustments

  • Why Adjustment
  • Stable is the key. Any overflow might cause game over (0 score for the whole scenario)
  • No 100% perfect model was trained to continuously and perfectly handle 7000

timesteps (1 – month data) without any problem in the short competition period. For 3120 actions, the problem has different 31207000 trajectories for each scenario

  • Hard to learn all safety constraints, such as action cooldown and overflow settings
  • Combine DRL with power grid knowledge
  • Adaptive Policy: Introduce early warning system

30

slide-31
SLIDE 31

Demo on A Hard Sample Case

31

  • Do Nothing
  • Trained Model

1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14

slide-32
SLIDE 32

Robustness Test Results

  • Test different trained models on 200 chronic, each has 5184 continuous steps

Model Mean Score All Mean Score W/O Game Over Game Over Count Action_156_danger_92 73665.9579 83238.3705 23 Action_156_danger_93 71455.1479 83087.3818 28 Action_156_danger_95 79064.1378 82789.6731 9 Action_176_danger_92_95 66997.7893 83747.2367 40 Action_251_danger_90_95 74863.0436 84591.0097 23 Action_251_danger_93_95 77495.2039 84233.9173 16 Action_251_danger_95_95 76550.0887 84120.9766 18 Action_251_danger_90 74979.7389 83776.2445 21 Action_251_danger_92 76334.7795 83425.9886 17 Action_251_danger_93 76978.1452 83219.6165 15

32

slide-33
SLIDE 33

Post Stage Efforts

33

  • Failed
  • Low Score
  • High Score
slide-34
SLIDE 34

Final Results

Ranking 1st in both development phase and final phase Development Phase ! Final Phase !

slide-35
SLIDE 35

Thank you!

di.shi@geirina.net www.geirina.net/research/2

35

slide-36
SLIDE 36

Other Research at GEIRINA AI Group

36

Grid Sense: IoT+X Leveraging edge computing for enhanced system SA and control

System architecture: edge computing Edge device: smart outlet Cloud platform

GEIRINA Grid Eye: SA platform that has been running in the provincial/state- level system for the past 36 months

Situational awareness: alarming & data visualization Parameter/data calibration Oscillation detection and location Data exploration & stability tracking

GEIRINA Grid Mind: Data-driven autonomous grid dispatch and control platform with self-learning capability

DRL: deep learning + reinforcement learning Ability to handle faster grid dynamics Sub-second autonomous dispatch & control Self-learning with grid interaction capabilities

*For more information, please check: www.geirina.net/research/2