Guidelines for Action Space Definition in Reinforcement - PowerPoint PPT Presentation

Guidelines for Action Space Definition in Reinforcement Learning-based Traffic Signal Control Systems Maxime Tréca, Julian Garbiso, Dominique Barth October 15, 2020

Outline I - Reinforcement Learning Applied to Traffic Signal Control II - Model III - Guidelines V - Conclusion VI - Bibliography

I - Basics of Reinforcement Learning s t Agent Environment a t r t Reinforcement Learning methods aim at learning from the feeback of the state-action-reward loop: ◮ By testing all possible state / action combinations ◮ By storing the resulting rewards of these combinations ◮ By establishing a policy using this stored data

I - Q-Learning Q-learning is a Reinforcement Learning algorithm developed by Watkins [2]. ◮ The agent records and updates an estimation of the payoff of each state/action pair it encounters in a Q-table. a 1 . . . a n s 1 v s 1 , a 1 . . . v s 1 , a n . . . . . . . . . . . . s m v s m , a 1 . . . v s m , a n ◮ An iterative formula is used to update this estimation: Q ( s t , a t ) ← ( 1 − α t ) Q ( s t , a t ) + α t ( r t + γ max Q ( s t + 1 , a t ))

I - Reinforcement Learning applied to Traffic Signal Control Reinforcement Learning (RL) algorithms have been applied to Traffic Signal Control (TSC) since the early 2000s: ◮ Wiering [3]: first use of Q-learning at the intersection level to decrease vehicle waiting time on a road network. ◮ El-Tantawy [1]: MARLIN algorithm, which coordinates multiple RL-based agents using real traffic data.

I - Problem Statement In the paper cited above, agent actions are either: ◮ Phase-based : the agent sets the entire length of the green phase. ◮ Step-based : the agent decides whether to extend the current phase every k steps. → No action space definition comparison in the litterature.

II - Experimental Framework ψ 2 ◮ We consider a single intersection ◮ State : � ψ i , d i , n 1 , n 2 � ◮ Action is either: ψ 1 ψ 1 ◮ The length of ψ i ◮ Extend ψ i by k steps ◮ Reward: � ω t + a − � ω t ψ 2

II - Traffic Generation λ + τ ◮ Two Poisson processes ◮ λ + τ = 0.5 λ λ ◮ τ measures the un-evenness of traffic λ + τ

II - Simulation Settings ◮ SUMO microscopic traffic simulator. ◮ 100 successive iterations of 10 000 steps each ◮ We mesure total vehicular delay over each iteration. ◮ Results normalized over 50 distinct runs.

III - Guideline #1 - Step-based v. Phase-based Methods τ Fixed Phase Step (Best) Step (Worst) 0.0 3 . 617 2 . 672 2 . 053 2 . 473 0.1 4 . 070 2 . 746 1 . 956 2 . 595 0.2 4 . 603 3 . 070 1 . 977 2 . 570 0.3 7 . 773 4 . 582 2 . 032 2 . 531 0.4 6 . 807 5 . 773 2 . 088 2 . 216 0.5 18 . 329 3 . 240 1 . 994 2 . 473 Table: Average vehicle waiting time after convergence per agent type and traffic parameter τ (in 10 3 seconds).

III - Guideline #1 - Step-based v. Phase-based Methods · 10 4 Fixed 5 Phase Step n = 5 Total Waiting Time (s) Step n = 15 4 3 2 0 10 20 30 40 50 60 70 80 90 100 Simulation Runs Guideline #1 Step-based methods are always preferable to phase-based ones.

III - Guideline #2 - Decision Interval Length τ k = 1 k = 5 k = 10 k = 15 k = 20 0.0 0 4.86 10.29 22.37 27.81 0.1 0 4.17 7.20 23.96 30.15 0.2 0 0.45 5.49 22.68 31.03 0.3 3.04 0 4.38 11.02 26.39 0.4 9.53 0 2.74 11.09 7.84 0.5 22.12 0.56 0 13.60 3.33 Table: Percentage difference with respect to the optimum average vehicle waiting time (marked as 0) for step-based methods by action interval value k and traffic scenario τ Guideline #2 Very short intervals between decision points are preferable for uniform traffic while slightly longer intervals are preferable for skewed traffic demand.

III - Optimal Interval Length τ k = 1 k = 5 k = 10 k = 15 k = 20 0.0 0 4.86 10.29 22.37 27.81 0.1 0 4.17 7.20 23.96 30.15 0.2 0 0.45 5.49 22.68 31.03 0.3 3.04 0 4.38 11.02 26.39 0.4 9.53 0 2.74 11.09 7.84 0.5 22.12 0.56 0 13.60 3.33 Table: Percentage difference with respect to the optimum average vehicle waiting time (marked as 0) for step-based methods by action interval value k and traffic scenario τ Guideline #3 Defining longer intervals between successive decision points (from 5 to 10 seconds) yields satisfactory to optimal results for step-based agents.

V - Conclusion Issue: No in-depth comparison of step-based and phase-based action spaces fo RL-TSC. Conclusions: ◮ Step-based is always preferable ◮ Shorter action interval for uniform traffic demand ◮ Optimal and realistic step interval between 5 and 10 seconds.

V - Conclusion ◮ Results only on a simple 4-street intersection ◮ However, guidelines validated on a NEMA -type intersection.

References I Samah El-Tantawy, Baher Abdulhai, and Hossam Abdelgawad. Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown toronto. IEEE Transactions on Intelligent Transportation Systems , 14(3):1140–1150, 2013. Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning , 8(3-4):279–292, 1992. MA Wiering. Multi-agent reinforcement learning for traffic light control. In Machine Learning: Proceedings of the Seventeenth International Conference (ICML’2000) , pages 1151–1158, 2000.

Guidelines for Action Space Definition in Reinforcement - PowerPoint PPT Presentation

Guidelines for Action Space Definition in Reinforcement Learning-based Traffic Signal Control Systems Maxime Trca, Julian Garbiso, Dominique Barth October 15, 2020 Outline I - Reinforcement Learning Applied to Traffic Signal Control II -

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement learning with restrictions on the action set Mario Bravo Universidad de Chile

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Municipal Water District of Orange County May 1, 2019 Action 1 Action 1 Action 2 Action 2

Foundations of Machine Learning Reinforcement Learning Reinforcement Learning Agent exploring

Reinforcement Learning: How Does It Work? We detect a state Reinforcement Learning We choose an

PRESENTATION Mostostal Puawy S.A. reinforcement reinforcement Puawy, 2015 AGENDA I.

The Average Waiting Time for Both Classes in a Delayed Accumulating Priority Queue Blair Bilodeau

Catching Idlers with Ease: A Lightweight Wait-State Profiler for MPI Programs Guoyong Mao, David

Mutual Exclusion, Async Completions Why We Wait 7C. Asynchronous Event Completions We await

Midterm Exam CSE 421/521 - Operating Systems Fall 2011 October 20th, Thursday Lecture - XIV

Intelligent flights disruption management Q COVID-19 Impact on airlines industry US$61 Bil Cash

Finding Temporal Paths under Waiting Time Constraints Philipp Zschoche TU Berlin July 7 2020,

HUD Moving to Work Expansion Training Webinar 1: Waivers October 14, 2020 Introduction &

1135 Basics Who decides? How do they decide? What is affected? What the CMS Rule

Guidelines for Action Space Definition in Reinforcement - PowerPoint PPT Presentation

Guidelines for Action Space Definition in Reinforcement Learning-based Traffic Signal Control Systems Maxime Trca, Julian Garbiso, Dominique Barth October 15, 2020 Outline I - Reinforcement Learning Applied to Traffic Signal Control II -

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement learning with restrictions on the action set Mario Bravo Universidad de Chile

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Municipal Water District of Orange County May 1, 2019 Action 1 Action 1 Action 2 Action 2

Foundations of Machine Learning Reinforcement Learning Reinforcement Learning Agent exploring

Reinforcement Learning: How Does It Work? We detect a state Reinforcement Learning We choose an

PRESENTATION Mostostal Puawy S.A. reinforcement reinforcement Puawy, 2015 AGENDA I.

The Average Waiting Time for Both Classes in a Delayed Accumulating Priority Queue Blair Bilodeau

Catching Idlers with Ease: A Lightweight Wait-State Profiler for MPI Programs Guoyong Mao, David

Mutual Exclusion, Async Completions Why We Wait 7C. Asynchronous Event Completions We await

Midterm Exam CSE 421/521 - Operating Systems Fall 2011 October 20th, Thursday Lecture - XIV

Intelligent flights disruption management Q COVID-19 Impact on airlines industry US$61 Bil Cash

Finding Temporal Paths under Waiting Time Constraints Philipp Zschoche TU Berlin July 7 2020,

HUD Moving to Work Expansion Training Webinar 1: Waivers October 14, 2020 Introduction &amp;

1135 Basics Who decides? How do they decide? What is affected? What the CMS Rule

HUD Moving to Work Expansion Training Webinar 1: Waivers October 14, 2020 Introduction &