Subgoals in Hierarchical Reinforcement Learning Tianren Tang Tian - - PowerPoint PPT Presentation

▶

Sep 27, 2022 186 likes •334 views

Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning Tianren Tang Tian Tan Shangqi Guo Xiaolin Hu Feng Chen Background Goal-Conditional HRL High policy suffers from non-stationary problem From MARL's

SLIDE 1

Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning

Tianren Tang Shangqi Guo Tian Tan Xiaolin Hu Feng Chen

SLIDE 2

Background

Goal-Conditional HRL
High policy suffers from non-stationary problem
From MARL's perspective, agent's policy is influenced by other agents
Another Perspective
Usually the action space for high policy is too large, therefore its action which

is sub-goal for low policy usually unreachable

Intuitively, action space reduction or action elimination
Drawbacks:
no similiar literature shows how to do space reduction
Reduction or elimination may cause sub-optimal

SLIDE 3

Intuition

Restrict space into k-step adajecent region

SLIDE 4

Theoretical Analysis

Shortest Transition Time
For optimal policy 𝜌∗
where 𝜒−1: 𝐻 → 𝑇 is a mapping from goal to state s

SLIDE 5

Theoretical Analysis

k-step adjacent region of s is defined:
Theorem 1:
there is always a surrogate goal 𝑕’ ∈ 𝐻𝐵 that 𝜌∗(𝑏∗|𝑡, 𝑕’) = 𝜌∗(𝑏∗|𝑡, 𝑕)
Theorem 2:
𝑕’ ∈ 𝐻𝐵, 𝑅∗(𝑡, 𝑕’) = 𝑅∗(𝑡, 𝑕)

SLIDE 6

Theoretical Optimizations

Original optimization objective

where 𝜐∗ = (𝑡0. . . 𝑡𝑈𝐿), 𝜍∗ = (𝑕0. . . 𝑕(𝑈−1)𝐿)

Relax above equations:

SLIDE 7

HRL with Adjacency Constraint

Adjacent Matrix approximation
Contrasitive Loss

SLIDE 8

Final Optimization Objective

With a learned adjacency network

SLIDE 9

Algorithm

SLIDE 10

Experiment Environment

Discrete & Continuous
Result

SLIDE 11

Abalation Study

Difference:
HRAC-O: HRAC with perfect adajency matrix from environment
NegReward: Relabel reward to negative and bound critic function

SLIDE 12

Visualization

SLIDE 13

Summary

Although Intuition is easy, this paper is overall good.