c dric colas phd student flowers team inria co authors
play

Cdric Colas Phd student @ Flowers team, INRIA Co-authors: Pierre - PowerPoint PPT Presentation

Cdric Colas Phd student @ Flowers team, INRIA Co-authors: Pierre Fournier, Olivier Sigaud, Mohamed Chetouani, Pierre-Yves Oudeyer Problem: Intrinsically Motivated Modular Multi-Goal RL Which type of goal should I target ? Reach, Push, Pick


  1. Cédric Colas Phd student @ Flowers team, INRIA Co-authors: Pierre Fournier, Olivier Sigaud, Mohamed Chetouani, Pierre-Yves Oudeyer

  2. Problem: Intrinsically Motivated Modular Multi-Goal RL Which type of goal should I target ? Reach, Push, Pick & Place, Stack .. ? Modular Multi-Goal Fetch Arm environment Curious: Intrinsically Motivated Modular Multi-Goal RL

  3. Problem: Intrinsically Motivated Modular Multi-Goal RL Which goal exactly ? Pick & Place at (x,y,z) ! Modular Multi-Goal Fetch Arm environment Curious: Intrinsically Motivated Modular Multi-Goal RL

  4. Problem: Intrinsically Motivated Modular Multi-Goal RL Controllable objects Distracting (learnable goals) objects (unlearnable goals) Modular Multi-Goal Fetch Arm environment Curious: Intrinsically Motivated Modular Multi-Goal RL

  5. The Curious Algorithm Modular goal encoding for UVFA: 1 e.g. of modular goals: Move gripper to (x,y,z) External world Pick & Place cube2 at (x,y,z) Push cube1 at (x,y) Sampling of modules and goals using absolute learning progress 2 (using Bandit algorithm) Modular replay buffer: 1: UVFA, Schaul et al., 2015 with hindsight learning 3, 4 2: IMGEP, Forestier, 2017 3: HER, Andrychowicz et al., 2017 (module and goal substitutions) 4: Unicorn, Mankowitz et al., 2018 Curious: Intrinsically Motivated Modular Multi-Goal RL

  6. Modular goal encoding vs Multi-Goal Module Experts Curious without LP Multi-Goal Module Experts HER Impact of the policy and value function architecture. Average success rates over the set of tasks (mean +/- std, 10 seeds). Curious: Intrinsically Motivated Modular Multi-Goal RL

  7. Automatic Curriculum with Absolute Learning Progress Mitigated thanks to fast LP-based refocus Using a bandit for module Forgetting due to interferences selection and replay among modules/goals Reach Push Pick&Place Stack Competence Absolute Learning Progress Selection Probabilities Curious: Intrinsically Motivated Modular Multi-Goal RL

  8. Resilience to Distracting Goals 0: CURIOUS (LP) 0: Random 4: CURIOUS (LP) 4: CURIOUS 4: Random 7: CURIOUS (LP) 7: Random Resilience to distracting goals: 0, 4 or 7 distracting modules. CURIOUS (intrinsically motivated) and Random (random module). Mean +/- sem, 10 seeds. Curious: Intrinsically Motivated Modular Multi-Goal RL

  9. Resilience to Forgetting and Sensory Failures CURIOUS (LP) Random Resilience to sensory failure: Recovery following a sensory failure. Mean +/- std, 10 seeds. CURIOUS recovers 95 % of its original performance twice as fast as Random. Curious: Intrinsically Motivated Modular Multi-Goal RL

  10. 10 Curious: Intrinsically Motivated Modular Multi-Goal RL

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend