Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface - PowerPoint PPT Presentation

Background Neural Fitted Actor-Critic Future works Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface University of Lorraine, LORIA 8 th July 2016 1/18

Background Neural Fitted Actor-Critic Future works Outline Background 1 Neural Fitted Actor-Critic 2 Future works 3 2/18

Background Neural Fitted Actor-Critic Future works Reinforcement Learning 3/18

Background Neural Fitted Actor-Critic Future works Reinforcement Learning Optimization problem � ∞ � t =0 γ t r t Find function π : S → A that maximize rewards E π � 3/18

Background Neural Fitted Actor-Critic Future works Constraints and Motivations Reinforcement learning + Developmental robotics : Continuous environments 1 No prior models of agent or environment 2 Use non linear approximator (neural networks) 3 No prior goal states or trajectories 4 4/18

Background Neural Fitted Actor-Critic Future works How to solve reinforcement learning problems ? Actor-only π : S → A play update ∞ t =0 γ t r t � π k +1 π k 5/18

Background Neural Fitted Actor-Critic Future works How to solve reinforcement learning problems ? Actor-only π : S → A play update ∞ t =0 γ t r t � π k +1 π k Critic-only Q : S × A → R play update deduce ∞ t =0 γ t r t � Q k +1 Q k π k 5/18

Background Neural Fitted Actor-Critic Future works How to solve reinforcement learning problems ? Actor-only π : S → A play update ∞ t =0 γ t r t � π k +1 π k Critic-only Q : S × A → R play update deduce ∞ t =0 γ t r t � Q k +1 Q k π k Actor-Critic π : S → A V : S → R play update ∞ t =0 γ t r t � π k V k +1 π k +1 V k update 5/18

Background Neural Fitted Actor-Critic Future works State of the art Critic only Fitted Q Iteration Q Learning, Sarsa Actor only Evolutionnary algorithms (CMA-ES, ...) PI 2 Actor-critic Natural Actor Critic Cacla 6/18

Background Neural Fitted Actor-Critic Future works State of the art Unsatisfied Constraints : (1) No Continuous environments (2) No prior models of agent or environment Critic only (3) Use linear approximator (4) No prior goal states or trajectories Fitted Q Iteration (1) Q Learning, Sarsa (1) Actor only Evolutionnary algorithms (CMA-ES) → poor data efficiency PI 2 (3) (4) Actor-critic Natural Actor Critic (3) (4) Cacla → poor data efficiency, lot of meta-parameters 7/18

Background Neural Fitted Actor-Critic Future works Landscape algorithms decisional complexity ideal algorithm data required 8/18

Background Neural Fitted Actor-Critic Future works Landscape algorithms decisional complexity NFQ ideal algorithm data required 8/18

Background Neural Fitted Actor-Critic Future works Neural Fitted Q (NFQ) decisional complexity NFQ data required N �� 2 � � a ′ ∈ A Q k ( s t +1 , a ′ ) � Q k +1 = arg min Q ( s t , a t ) − r t +1 + γ max Q ∈F c t =1 π ∗ ( s ) = arg max Q ( s , a ) a ∈ A Hidden Inputs Outputs layer Q ( s , a ) s 1 s 2 s 3 a 1 a 2 9/18

Background Neural Fitted Actor-Critic Future works CACLA decisional complexity Temporal Difference Error δ t = r t + γ V ( s t +1 ) − V ( s t ) CACLA CMA-ES data required Critic V k +1 ( s ) = V k ( s ) + α v δ t ∂ V t ( s t ) θ V i , k +1 = θ V i , k +1 + α v δ t ∂θ V i , k +1 Actor α a ( a t − u t ) ∂ u t ( s t )  ∂θ t , if δ > 0    θ t +1 = θ t +   0 , otherwise  10/18

Background Neural Fitted Actor-Critic Future works Neural Fitted Actor Critic 2) a. actor update { s t , a t } a ∼ π Environment > 0 π δ Agent Rprop V 0 ≤ 0 { s t , u t } u D π repeat V k V k +1 s , r Rprop { s t , v k , t } 1) interactions 2) b. critic update 11/18

Background Neural Fitted Actor-Critic Future works Neural Fitted Actor Critic decisional complexity � 2 � � V k +1 ← argmin V ( s t ) − r t +1 + γ V k − 1 ( s t +1 ) NFAC V ∈F c s t ∈D π data required Hidden Inputs Outputs layer V ( s ) s 1 s 2 s 3 � a t , if δ t > 0 � 2 � � π k +1 ← argmin π ( s t ) − u t , otherwise π ∈F a s t ∈D π Hidden Inputs Outputs layer s 1 a 1 s 2 a 2 s 3 12/18

Background Neural Fitted Actor-Critic Future works Experimental Results 13/18

Background Neural Fitted Actor-Critic Future works Landscape algorithms decisional complexity NFQ NFAC CACLA ideal CMA-ES algorithm data required 14/18

Background Neural Fitted Actor-Critic Future works Landscape algorithms decisional complexity NFQ DDPG NAF ? NFAC CACLA ideal CMA-ES algorithm data required 15/18

Background Neural Fitted Actor-Critic Future works Methods landscape decisional complexity NFQ NFAC+ DDPG NAF ? NFAC CACLA ideal CMA-ES algorithm data required 16/18

Background Neural Fitted Actor-Critic Future works Toward a better data efficiency Fitted Actor-Critic N �� 2 � � Q π � r t +1 + γ Q π k +1 = argmin c ( a t | s t ) Q ( s t , a t ) − k ( s t +1 , π ( s t +1 )) Q ∈F c t =1 N � � � π k +1 = argmax Q k +1 s t , π k ( s t ) π ∈F a t =1 1 , π ( a t | s t ) � � c ( a t | s t ) = min π 0 ( a t | s t ) 17/18

Background Neural Fitted Actor-Critic Future works Conclusion & Further Works Neural Fitted Actor Critic Compare to DDPG Don’t forget previous data Guided exploration of sensorimotor space Increase the dimension of states/actions Redefine the reward function for the new sub-goal 18/18

Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface - PowerPoint PPT Presentation

Background Neural Fitted Actor-Critic Future works Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface University of Lorraine, LORIA 8 th July 2016 1/18 Background Neural Fitted Actor-Critic Future works Outline

Soft Actor-Critic Zikun Chen, Minghan Li Jan. 28, 2020 Soft Actor-Critic: Ofg-Policy Maximum

DAC: The Double Actor-Critic Architecture for Learning Options NeurIPS 2019 Shangtong Zhang,

AN AN AN ACTOR AN ACTOR ACTOR ACTOR- - - -CENTERED POLICY PROCESS CENTERED POLICY PROCESS

Living Actor Living Actor Living Actor - Use Cases Living Actor - Use Cases Use Cases

Actor-Critic Policy Learning in Cooperative Planning Josh Redding, Alborz Geramifard Han-Lim Choi

Movie & Actor QI, Xiaoxu CHEN, Guanhao JIN, Yue OVERVIEW Goal: build a movie and actor

Why actor analysis? Actor and network analysis Bert Enserink Network map of linked Network map

The Fitted Response Surface Graph the fitted surface and its standard error: response.R 1 / 17

On the global convergence of a singularly perturbed parabolic problem of reaction diffusion type

Data structures wa y x 1 D ASE System System E C r* O r state D Critic Critic E

CAF C++ Actor Framework Matthias Vallentin UC Berkeley Berkeley C++ Summit October 17, 2016

ECE 3574: Applied Software Design Actor Pattern Today we are going to look at an abstraction of

Parallel Programming and Heterogeneous Computing D3 - Shared-Nothing: Actors Max Plauth, Sven

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Actor-Attention-Critic for Multi-Agent Reinforcement Learning Shariq Iqbal and Fei Sha Outline

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

Time and Length Scales in Glassy Systems Giulio Biroli

Geometry of analytic P-ideals Piotr Borodulin-Nadzieja Vienna 2013 joint work with Barnab as

Divergence-free discontinuous Galerkin method for ideal compressible MHD equations Praveen

Tropical ideals do not realise all Bergman fans Jan Draisma Universit at Bern and Eindhoven

The Calabi-Yau Landscape: Beyond the Lampposts Mehmet Demirtas Cornell University String Pheno

Project Management: Tips, Tools & Tricks for any Type or Size of Library Kirsten Clark &

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural

Efficiency Issues Neighborhoods and Landscapes Marco Chiarandini Department of Mathematics &

Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface - PowerPoint PPT Presentation

Background Neural Fitted Actor-Critic Future works Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface University of Lorraine, LORIA 8 th July 2016 1/18 Background Neural Fitted Actor-Critic Future works Outline

Soft Actor-Critic Zikun Chen, Minghan Li Jan. 28, 2020 Soft Actor-Critic: Ofg-Policy Maximum

DAC: The Double Actor-Critic Architecture for Learning Options NeurIPS 2019 Shangtong Zhang,

AN AN AN ACTOR AN ACTOR ACTOR ACTOR- - - -CENTERED POLICY PROCESS CENTERED POLICY PROCESS

Living Actor Living Actor Living Actor - Use Cases Living Actor - Use Cases Use Cases

Actor-Critic Policy Learning in Cooperative Planning Josh Redding, Alborz Geramifard Han-Lim Choi

Movie &amp; Actor QI, Xiaoxu CHEN, Guanhao JIN, Yue OVERVIEW Goal: build a movie and actor

Why actor analysis? Actor and network analysis Bert Enserink Network map of linked Network map

The Fitted Response Surface Graph the fitted surface and its standard error: response.R 1 / 17

On the global convergence of a singularly perturbed parabolic problem of reaction diffusion type

Data structures wa y x 1 D ASE System System E C r* O r state D Critic Critic E

CAF C++ Actor Framework Matthias Vallentin UC Berkeley Berkeley C++ Summit October 17, 2016

ECE 3574: Applied Software Design Actor Pattern Today we are going to look at an abstraction of

Parallel Programming and Heterogeneous Computing D3 - Shared-Nothing: Actors Max Plauth, Sven

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Actor-Attention-Critic for Multi-Agent Reinforcement Learning Shariq Iqbal and Fei Sha Outline

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

Time and Length Scales in Glassy Systems Giulio Biroli

Geometry of analytic P-ideals Piotr Borodulin-Nadzieja Vienna 2013 joint work with Barnab as

Divergence-free discontinuous Galerkin method for ideal compressible MHD equations Praveen

Tropical ideals do not realise all Bergman fans Jan Draisma Universit at Bern and Eindhoven

The Calabi-Yau Landscape: Beyond the Lampposts Mehmet Demirtas Cornell University String Pheno

Project Management: Tips, Tools &amp; Tricks for any Type or Size of Library Kirsten Clark &amp;

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural

Efficiency Issues Neighborhoods and Landscapes Marco Chiarandini Department of Mathematics &amp;

Movie & Actor QI, Xiaoxu CHEN, Guanhao JIN, Yue OVERVIEW Goal: build a movie and actor

Project Management: Tips, Tools & Tricks for any Type or Size of Library Kirsten Clark &

Efficiency Issues Neighborhoods and Landscapes Marco Chiarandini Department of Mathematics &