Control Frequency Adaptation via Action Persistence in Batch - PowerPoint PPT Presentation

Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning Alberto Maria Metelli Flavio Mazzolini Lorenzo Bisi Luca Sabbioni Marcello Restelli July 2020 Thirty-seventh International Conference on Machine Learning

1 1 Motivations Problem : How to select the control frequency for a system? Lower Frequencies Higher Frequencies Research Question : Can we exploit this trade-off to find an optimal control frequency? A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

1 1 Motivations Problem : How to select the control frequency for a system? Lower Frequencies Higher Frequencies Control Opportunities Research Question : Can we exploit this trade-off to find an optimal control frequency? A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

1 1 Motivations Problem : How to select the control frequency for a system? Lower Frequencies Higher Frequencies Control Opportunities Sample Complexity Research Question : Can we exploit this trade-off to find an optimal control frequency? A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

1 1 Motivations Problem : How to select the control frequency for a system? Lower Frequencies Higher Frequencies Trade-Off Control Opportunities Sample Complexity Research Question : Can we exploit this trade-off to find an optimal control frequency? A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

2 Control Frequency and Action Persistence 2 Idea : persisting each action for k steps continuous- discrete-time k -persistent time MDP MDP MDP M 0 M ∆ t M k ∆ t time action control 0 ∆ t k ∆ t discretization persistence k time-step control 1 f 8 f “ ∆ t frequency k Action persistence as form of environment configurability (Metelli et al., 2018) A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

3 3 Outline 1 Action persistence formalization A 0 A 0 A 0 2 Performance loss due to persistence S 0 S 1 S 2 S 3 3 Persistent Fitted Q-Iteration A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

3 3 Outline 1 1 Action persistence formalization 0.8 0.6 1 − γ k − 1 1 − γ k 0.4 2 Performance loss due to persistence 0.2 3 Persistent Fitted Q-Iteration 5 10 15 20 k A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

3 3 Outline 1 Action persistence formalization � T δ � T ∗ � T δ Π F Π F 2 Performance loss due to persistence Π F Q ( j +1) Q ( j ) F Q ( j + k ) 3 Persistent Fitted Q-Iteration A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

4 4 No Action Persistence M “ p S , A , P, R, γ q and π π : S Ñ P p A q is Markovian and Stationary (Puterman, 2014; Sutton and Barto, 2018) A 0 ∼ π ( ·| S 0 ) A 1 ∼ π ( ·| S 1 ) A 2 ∼ π ( ·| S 2 ) A 3 ∼ π ( ·| S 3 ) A 4 ∼ π ( ·| S 4 ) A 5 ∼ π ( ·| S 5 ) S 0 S 1 S 2 S 3 S 4 S 5 S 6 t = 0 t = 1 t = 2 t = 3 t = 4 t = 5 t = 6 A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

5 Action Persistence 5 Policy View Change the policy Ñ k -persistent policy # π p a | s t q if t mod k “ 0 M “ p S , A , P, R, γ q and π k π t,k p a | h t q “ δ a t ´ 1 p a q otherwise History h t “ p s 0 , a 0 , . . . , s t ´ 1 , a t ´ 1 , s t q π k is Non-Markovian and Non-Stationary A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

5 Action Persistence 5 Policy View Change the policy Ñ k -persistent policy # π p a | s t q if t mod k “ 0 M “ p S , A , P, R, γ q and π k π t,k p a | h t q “ δ a t ´ 1 p a q otherwise History h t “ p s 0 , a 0 , . . . , s t ´ 1 , a t ´ 1 , s t q π k is Non-Markovian and Non-Stationary A 0 ∼ π ( ·| S 0 ) A 3 ∼ π ( ·| S 3 ) A 0 A 0 A 3 A 3 S 0 S 1 S 2 S 3 S 4 S 5 S 6 t = 0 t = 1 t = 2 t = 3 t = 4 t = 5 t = 6 A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

6 Action Persistence 6 Environment View Change the MDP Ñ k -persistent MDP ` ˘ P k p s 1 | s, a q “ p P δ q k ´ 1 P p s 1 | s, a q ` S , A , P k , R k , γ k ˘ i “ 0 γ i ` ˘ M k “ and π R k p s 1 | s, a q “ ř k ´ 1 p s 1 | s, a q p P δ q i R Persistent state-action kernel P δ p s 1 , a 1 | s, a q “ δ a 1 p a q P p s 1 | s, a q M k has smaller discount factor γ k A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

6 Action Persistence 6 Environment View Change the MDP Ñ k -persistent MDP ` ˘ P k p s 1 | s, a q “ p P δ q k ´ 1 P p s 1 | s, a q ` S , A , P k , R k , γ k ˘ i “ 0 γ i ` ˘ M k “ and π R k p s 1 | s, a q “ ř k ´ 1 p s 1 | s, a q p P δ q i R Persistent state-action kernel P δ p s 1 , a 1 | s, a q “ δ a 1 p a q P p s 1 | s, a q M k has smaller discount factor γ k A 0 ∼ π ( ·| S 0 ) A 3 ∼ π ( ·| S 3 ) S 0 S 1 S 2 S 3 S 4 S 5 S 6 t = 0 t = 1 t = 2 t = 3 t = 4 t = 5 t = 6 A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

7 Persistent Bellman Operators 7 k -persistent MDP M k MDP M Persistence Operator ż Bellman Operator (Bertsekas, 2005) p T δ f qp s, a q “ r p s, a q ` γ P p d s 1 | s, a q f p s 1 , a q S ż k -persistent Bellman Operator p T ˚ f qp s, a q “ r p s, a q ` γ P p d s 1 | s, a q max a 1 P A f p s 1 , a 1 q S T ˚ k “ p T δ q k ´ 1 T ˚ T ˚ is a γ -contraction in L 8 -norm T ˚ k is a γ k -contraction in L 8 -norm Q ˚ is the unique fixed point of T ˚ Q ˚ k is the unique fixed point of T ˚ T ˚ Q ˚ “ Q ˚ k T ˚ k Q ˚ k “ Q ˚ k A. M. Metelli Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning ICML 2020

Control Frequency Adaptation via Action Persistence in Batch - PowerPoint PPT Presentation

Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning Alberto Maria Metelli Flavio Mazzolini Lorenzo Bisi Luca Sabbioni Marcello Restelli July 2020 Thirty-seventh International Conference on Machine Learning 1

Java Technologies Java Persistence API (JPA) Persistence Layer The Persistence Layer is used by

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Databases SWEN-250 Persistence First a detour Persistence is the key to solving most

Keep Persistence Simple, Stupid A possible future for Java Persistence Robert Brutigam adidas

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Action 1. Encourage MS to adopt Adaptation Strategies and action plans Action 2. LIFE funding,

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

Cheap Children and the Cheap Children and the Persistence of Poverty Persistence of

Persistent Homology: Persistence Modules Andrey Blinov 6 October 2017 Andrey Blinov Persistent

Increasing Increasing Enrollment and Enrollment and Persistence for Persistence for First Gen

Session 14 DB Persistence (JPA) Preliminary Slide Set JPA is now referred to as Jakarta

Graph Databases for Polyglot Persistence with NotaQL 2017-03-08 Johannes Schildgen

Java Object/Relational Persistence with Hibernate David Lucek 11 Jan 2005 Object Relational

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27

Sliding-Block Back Analysis of Earthquake-Induced Slides Article December 2000 DOI:

Graph Algorithms L.F.O.A. Lecture Full Of Acronyms Graph Search Algorithms The most basic graph

IPv6 R&D and Deployment IPv6 R&D and Deployment Status in Korea Status in Korea Feb.

Functional Interpretations of Type Theory CMI Workshop: Quantum Mechanics and Computation J. M.

Functionalization of PU foams Functionalization of PU foams via inor via inorganic and or ganic

PIP- II Cryogenics Arkadiy Klebaner, Anindya Chakravarty, and Tejas Rane PIP-II Machine Advisory

Differential Attacks on Generalized Feistel Schemes Val erie Nachef - Emmanuel Volte - Jacques

Alexandru Suciu Northeastern University Session on the Geometry and Topology of Differentiable

Control Frequency Adaptation via Action Persistence in Batch - PowerPoint PPT Presentation

Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning Alberto Maria Metelli Flavio Mazzolini Lorenzo Bisi Luca Sabbioni Marcello Restelli July 2020 Thirty-seventh International Conference on Machine Learning 1

Java Technologies Java Persistence API (JPA) Persistence Layer The Persistence Layer is used by

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Databases SWEN-250 Persistence First a detour Persistence is the key to solving most

Keep Persistence Simple, Stupid A possible future for Java Persistence Robert Brutigam adidas

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Action 1. Encourage MS to adopt Adaptation Strategies and action plans Action 2. LIFE funding,

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

Cheap Children and the Cheap Children and the Persistence of Poverty Persistence of

Persistent Homology: Persistence Modules Andrey Blinov 6 October 2017 Andrey Blinov Persistent

Increasing Increasing Enrollment and Enrollment and Persistence for Persistence for First Gen

Session 14 DB Persistence (JPA) Preliminary Slide Set JPA is now referred to as Jakarta

Graph Databases for Polyglot Persistence with NotaQL 2017-03-08 Johannes Schildgen

Java Object/Relational Persistence with Hibernate David Lucek 11 Jan 2005 Object Relational

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27

Sliding-Block Back Analysis of Earthquake-Induced Slides Article December 2000 DOI:

Graph Algorithms L.F.O.A. Lecture Full Of Acronyms Graph Search Algorithms The most basic graph

IPv6 R&amp;D and Deployment IPv6 R&amp;D and Deployment Status in Korea Status in Korea Feb.

Functional Interpretations of Type Theory CMI Workshop: Quantum Mechanics and Computation J. M.

Functionalization of PU foams Functionalization of PU foams via inor via inorganic and or ganic

PIP- II Cryogenics Arkadiy Klebaner, Anindya Chakravarty, and Tejas Rane PIP-II Machine Advisory

Differential Attacks on Generalized Feistel Schemes Val erie Nachef - Emmanuel Volte - Jacques

Alexandru Suciu Northeastern University Session on the Geometry and Topology of Differentiable

IPv6 R&D and Deployment IPv6 R&D and Deployment Status in Korea Status in Korea Feb.