subgoals in hierarchical reinforcement
play

Subgoals in Hierarchical Reinforcement Learning Tianren Tang Tian - PowerPoint PPT Presentation

Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning Tianren Tang Tian Tan Shangqi Guo Xiaolin Hu Feng Chen Background Goal-Conditional HRL High policy suffers from non-stationary problem From MARL's


  1. Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning Tianren Tang Tian Tan Shangqi Guo Xiaolin Hu Feng Chen

  2. Background • Goal-Conditional HRL • High policy suffers from non-stationary problem • From MARL's perspective, agent's policy is influenced by other agents • Another Perspective • Usually the action space for high policy is too large, therefore its action which is sub-goal for low policy usually unreachable • Intuitively, action space reduction or action elimination • Drawbacks: • no similiar literature shows how to do space reduction • Reduction or elimination may cause sub-optimal

  3. Intuition • Restrict space into k-step adajecent region

  4. Theoretical Analysis • Shortest Transition Time • For optimal policy 𝜌 ∗ • where 𝜒 −1 : 𝐻 → 𝑇 is a mapping from goal to state s

  5. Theoretical Analysis • k-step adjacent region of s is defined: • Theorem 1: • there is always a surrogate goal 𝑕 ’ ∈ 𝐻 𝐵 that 𝜌 ∗ (𝑏 ∗ |𝑡, 𝑕 ’ ) = 𝜌 ∗ (𝑏 ∗ |𝑡, 𝑕) • Theorem 2: • 𝑕 ’ ∈ 𝐻 𝐵 , 𝑅 ∗ (𝑡, 𝑕 ’ ) = 𝑅 ∗ (𝑡, 𝑕)

  6. Theoretical Optimizations • Original optimization objective where 𝜐 ∗ = (𝑡 0 . . . 𝑡 𝑈𝐿 ), 𝜍 ∗ = (𝑕 0 . . . 𝑕 (𝑈−1)𝐿 ) • Relax above equations:

  7. HRL with Adjacency Constraint • Adjacent Matrix approximation • Contrasitive Loss

  8. Final Optimization Objective • With a learned adjacency network

  9. Algorithm

  10. Experiment Environment • Discrete & Continuous • Result

  11. Abalation Study • Difference: • HRAC-O: HRAC with perfect adajency matrix from environment • NegReward: Relabel reward to negative and bound critic function

  12. Visualization

  13. Summary • Although Intuition is easy, this paper is overall good.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend