SDRL: Interpretable and Data-efficient Deep Liu Reinforcement - - PowerPoint PPT Presentation

sdrl interpretable and data efficient deep
SMART_READER_LITE
LIVE PREVIEW

SDRL: Interpretable and Data-efficient Deep Liu Reinforcement - - PowerPoint PPT Presentation

SDRL: Symbolic Deep Reinforcement Learning SDRL: Interpretable and Data-efficient Deep Liu Reinforcement Learning Introduction Background Leveraging Symbolic Planning Method Experiment Conclusion Bo Liu and Future Work Auburn


slide-1
SLIDE 1

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning

Bo Liu

Auburn University, Auburn, AL, USA

1 / 20

slide-2
SLIDE 2

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Collaborators

Daoming Lyu Auburn University Auburn, AL, USA Fangkai Yang NVIDIA Corporation Redmond, WA, USA Steven Gustafson Maana Inc. Bellevue, WA, USA

2 / 20

slide-3
SLIDE 3

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Sequential Decision-Making

Sequential decision-making (SDM) concerns an agent making a sequence of actions based on its behavior in the environment. Deep reinforcement learning (DRL) achieves tremendous success on sequential decision-making problems using deep neural networks (Mnih et al., 2015).

3 / 20

slide-4
SLIDE 4

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Challenge: Montezuma’s Revenge

The avatar: climbs down the ladder, jumps over a rotating skull, picks up the key (+100), goes back and uses the key to open the right door (+300). Vanilla DQN achieves 0 score (Mnih et al., 2015).

4 / 20

slide-5
SLIDE 5

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Challenge: Montezuma’s Revenge

Problem: long horizon sequential actions, sparse and delayed reward.

poor data efficiency. lack of interpretability.

5 / 20

slide-6
SLIDE 6

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Our Solution

Solution: task decomposition Symbolic planning: subtasks scheduling (high-level plan). DRL: subtask learning (low-level control). Meta-learner: subtask evaluation. Goal Symbolic planning drives learning, improving task-level interpretablility. DRL learns feasible subtasks, improving data-efficiency.

6 / 20

slide-7
SLIDE 7

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Background: Symbolic Planning with Action Language

Action language (Gelfond & Lifschitz, 1998): a formal, declarative, logic-based language that describes dynamic domains. Dynamic domains can be represented as a transition system.

7 / 20

slide-8
SLIDE 8

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Action Language BC

Action Language BC (Lee et al., 2013) is a language that describes the transition system using a set of causal laws. dynamic laws describe transition of states move(x, y1, y2) causes on(x, y2) if on(x, y1). static laws describe value of fluents inside a state intower(x, y2) if intower(x, y1), on(y1, y2).

8 / 20

slide-9
SLIDE 9

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Background: Reinforcement Learning

Reinforcement learning is defined on a Markov Decision Process (S, A, Pa

ss′, r, γ). To achieve optimal behavior, a

policy π : S × A → [0, 1] is learned. An option is defined on the tuple (I, π, β), which enables the decision-making to have a hierarchical structure:

the initiation set I ⊆ S, policy π: S × A → [0, 1], probabilistic termination condition β: S → [0, 1].

9 / 20

slide-10
SLIDE 10

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

SDRL: Symbolic Deep Reinforcement Learning

Symbolic Planner: orchestrates sequence of subtasks using high-level symbolic plan. Controller: uses DRL approaches to learn the subpolicy for each subtask with intrinsic rewards. Meta-Controller: measures learning performance of subtasks, updates intrinsic goal to enable reward-driven plan improvement.

10 / 20

slide-11
SLIDE 11

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Symbolic Planner

11 / 20

slide-12
SLIDE 12

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Symbolic Planner: Planning with Intrinsic Goal

Intrinsic goal: a linear constraint on plan quality quality ≥ quality(Πt) where Πt is the plan at episode t. Plan quality: a utility function quality(Πt) =

  • si−1,gi−1,si∈Πt

ρgi−1(si−1) where ρgi is the gain reward for subtask gi. Symbolic planner: generates a new plan that

explores new subtasks, exploits more rewarding subtasks.

12 / 20

slide-13
SLIDE 13

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

From Symbolic Transition to Subtask

Assumption: given the set S of symbolic states and S of sensory input, we assumed there is an Oracle for symbol grounding: F : S × S → {t, f}. Given F and a pair of symbolic states s, s′ ∈ S:

initiation set I = {˜ s ∈ S : F(s, ˜ s) = t}, π : S → A is the subpolicy for the corresponding subtask, β is the termination condition such that β(˜ s′) =

  • 1

F(s′, ˜ s′) = t, for ˜ s′ ∈ S,

  • therwise.

13 / 20

slide-14
SLIDE 14

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Controller

14 / 20

slide-15
SLIDE 15

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Controllers: DRL with Intrinsic Reward

Intrinsic reward: pseudo-reward crafted by the human. Given a subtask defined on (I, π, β), intrinsic reward ri(˜ s′) =

  • φ

β(˜ s′) = 1 r

  • therwise

where φ is a positive constant encouraging achieving subtasks and r is the reward from the environment at state ˜ s′.

15 / 20

slide-16
SLIDE 16

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Meta-Controller

16 / 20

slide-17
SLIDE 17

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Meta-Controller: Evaluation with Extrinsic Reward

Extrinsic rewards: re(s, g) = f (ǫ) where ǫ can measure the competence of the learned subpolicy for each subtask. For example, let ǫ be the success ratio, f can be defined as f (ǫ) = −ψ ǫ < threshold r(s, g) ǫ ≥ threshold

ψ is a positive constant to punish selecting unlearnable subtasks, r(s, g) is the cumulative environmental reward by following the subtask g.

17 / 20

slide-18
SLIDE 18

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Experimental Results I.

18 / 20

slide-19
SLIDE 19

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Experimental Results II.

Baseline: Kulkarni et. al, Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, NIPS’2016. 19 / 20

slide-20
SLIDE 20

SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work

Conclusion

We present a SDRL framework features:

High-level symbolic planning based on intrinsic goal Low-level policy control with DRL. Subtask learning evaluation by a meta-learner.

This is the first work on integrating symbolic planning with DRL that achieves both task-level interpretability and data-efficiency for decision-making. Future work.

20 / 20