recurrent predictive state policy rpsp networks
play

Recurrent Predictive State Policy ( RPSP ) Networks ICML 2018, - PowerPoint PPT Presentation

Instituto de Telecomunicaes Instituto de Sistemas e Robtica Robotics Institute Instituto Superior Tcnico Carnegie Mellon University Recurrent Predictive State Policy ( RPSP ) Networks ICML 2018, Stockholm Sweden July 12, 2018 Zita


  1. Instituto de Telecomunicações Instituto de Sistemas e Robótica Robotics Institute Instituto Superior Técnico Carnegie Mellon University Recurrent Predictive State Policy ( RPSP ) Networks ICML 2018, Stockholm Sweden July 12, 2018 Zita Marinho Co-authors: zmarinho@cmu.edu Ahmed Hefny, CMU ( equal contribution ) Wen Sun, CMU Siddhartha S. Srinivasa, UW/CMU Geoffrey J. Gordon, CMU/Microsoft

  2. Policy learning and model learning partial obs actions a 1 a 2 … a t o 1 o 2 … o t policy π robot joint torques robot joint angles Recurrent Predictive State Policy nets 2 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  3. Recurrent Predictive State Policy Nets ! " sample actions 4 ! " ! "#$ ! "#$ ! "#$ 6 " Σ # "3( # "'( # " # "'2 states - predictive states ` pred $ 0 " observations Recurrent Predictive State Policy nets 3 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  4. Predictive State Representations h t = [o t-n:t , a t-n:t ] history future o 1 o 2 o t … o t-1 … o t+k o t+k+1 a t a 1 a 2 … … a t+k a t+k+1 a t-1 q t predictive state → E [ o t : t + k | h t ; a t : t + k ] q t − Boots et al. 2009 sufficient statistic of conditional future observations TPSRs, Rosencrantz et al. 2004 Littman et al. 2001, Jaeger et al.1998 Recurrent Predictive State Policy nets 4 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  5. Predictive State Representations Prediction W pred q t history future o 1 o 2 o t … o t-1 … o t+k o t+k+1 a t a 1 a 2 … … a t+k a t+k+1 a t-1 linear transformation in feature space (RKHS) Recurrent Predictive State Policy nets 5 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  6. Predictive State Representations Filtering W ext q t q t+1 o 1 o 2 o t … o t-1 … o t+k o t+k+1 a t a 1 a 2 … … a t+k a t+k+1 a t-1 PSR Filter state update q t +1 = f cond ( W ext q t , a t , o t ) in RKHS this is kernel Bayes' rule (Fukumizu et al. 2013) Recurrent Predictive State Policy nets 6 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  7. Predictive State Representations ? W pred W ext q 0 how do we learn a PSR Boots et al. 2011, Hefny et al. 2015, Sun et al. 2015 o 1 o 2 o t … o t-1 … o t+k o t+k+1 a t a 1 a 2 … … a t+k a t+k+1 a t-1 q t no reward signal … reduction to supervised learning !!!! Recurrent Predictive State Policy nets 7 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  8. Recurrent Predictive State Policy Nets Why use PSRs as filter? Consistent initialization Predictive State + Method of moments • • Non-linear dynamics Kernel-based representation • • Scalable learning algorithm Random projections • • Robustness and sample efficiency Local refinement by BPTT • • # "'( # "3( $ ) # "'2 # " PSR states *+,- %&" PSR $ ! " 0 " ./%- observations 0 1 " Recurrent Predictive State Policy nets 8 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  9. Recurrent Predictive State Policy Nets π θ ) " ! " actions sample sample reactive policy 4 θ re 6 " Σ ) # "'( ! "#$ ! "($ # "3( $ # "'2 ! "#' # " ! " PSR states *+,- %&" θ PSR PSR $ 0 " ! " ./%- observations % " 0 % 1 " & " Z. Marinho,A. Hefny, W. Sun G. Gordon, S. Srinivasa ICML 2018 (under review) Recurrent Predictive State Policy nets 9 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  10. RPSP Initialization π θ ) " PSR initialization with Method of Moments actions sample • efficient and consistent Boots 11, Hefny et al. 2015 • does not require interaction (reward signal) reactive policy Downey et al. 2017 • differentiable can be trained end-to-end θ re ! "#$ ! "($ ! "#' ! " PSR states θ PSR PSR observations % " % & " Recurrent Predictive State Policy nets 10 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  11. RPSP Optimization actions reward Cumulative reward a t r t PSR states q t accomplish the task observations Prediction error ˆ o t o t ` pred keep model accurate Recurrent Predictive State Policy nets 11 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  12. Algorithm ' " sample 1. Initialize PSR θ re initialize ! "#$ ! " θ PSR % " % & " Recurrent Predictive State Policy nets 12 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  13. Algorithm 1. Initialize PSR 2. Optimize on a batch of trajectories o 1 o 2 … o t o t+1 … o t+k a 1 a 2 … a t a t+1 … a t+k r 1 r 2 … r t r t+1 … r t+k ` pred J ( π θ ) Recurrent Predictive State Policy nets 13 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  14. RPSP Optimization learning via policy gradient alternate optimization joint opt REINFORCE “Vanilla” Policy Gradient Natural Gradient - higher variance - requires Hessian vector mult. + faster , simpler + smoother policy changes Schulman et al. 2015 Williams et al. 1992 • direct policy estimation • applicable to any robust gradient optimizer Recurrent Predictive State Policy nets 14 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  15. Experiments OpenAI Gym MUJOCO environments • partial observations (joints/ no vel.) • continuous observations • continous actions Swimmer CartPole Walker2d Hopper 3 joints 2 joints 8 joints 5 joints 6 DoFs 3 DoFs 2 DoFs 1DoF Recurrent Predictive State Policy nets 15 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  16. Experiments Cross-environment performance Recurrent Predictive State Policy nets 16 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  17. Conclusions • combine PSR filter + reactive network for partial environments • make use of consistent initialization methods for the filter • make use of prediction loss to improve policy • end-to-end policy learning algorithm Recurrent Predictive State Policy nets 17 zmarinho@cmu.edu | ICML 2018 - poster #200 |

  18. Thank you! zmarinho@cmu.edu Questions ? Come See US @ POSTER #200 This research was supported by the Portuguese Foundation of Science and Technology under grant SFRH/BD/52015/2012. Recurrent Predictive State Policy nets 18 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend