Differentiable Tree Planning for Deep RL
Greg Farquhar
1
Differentiable Tree Planning for Deep RL Greg Farquhar 1 In - - PowerPoint PPT Presentation
Differentiable Tree Planning for Deep RL Greg Farquhar 1 In Collaboration With Tim Rocktaschel, Maximilian Igl, & Shimon Whiteson Greg Farquhar 2 / 35 Overview Reinforcement learning Model-based RL and online planning
1
Greg Farquhar
Tim Rocktaschel, Maximilian Igl, & Shimon Whiteson
2 / 35
Greg Farquhar
3 / 35
Greg Farquhar
4 / 35
Greg Farquhar
5 / 35
Greg Farquhar
○ Rewards are sparse ○ Credit assignment ○ Exploration and exploitation ○ Large state/action spaces ○ Approximation and generalisation
6 / 35
Greg Farquhar
7 / 35
Greg Farquhar
8 / 35
Greg Farquhar
9 / 35
Greg Farquhar
○ Target networks ○ Replay memory ○ Parallel environment threads
10 / 35
Greg Farquhar
11 / 35
Greg Farquhar
12 / 35
Greg Farquhar
13 / 35
Greg Farquhar
○ Complex ○ Generalise poorly to new parts of the state space
14 / 35
Greg Farquhar
15 / 35 Action-conditional video prediction using deep networks in atari games (Oh et. al 2015)
Greg Farquhar
16 / 35 Action-conditional video prediction using deep networks in atari games (Oh et. al 2015)
Greg Farquhar
○ Value prediction ○ Performance on the real task
end to end.
17 / 35
Greg Farquhar
18 / 35
Greg Farquhar
19 / 35
Greg Farquhar
20 / 35
Greg Farquhar
21 / 35
Greg Farquhar
22 / 35
Greg Farquhar
23 / 35
a1 a2 a3 shared normalise action conditional
Greg Farquhar
24 / 35
Greg Farquhar
○ Inside true targets
25 / 35
Greg Farquhar
26 / 35
Greg Farquhar
27 / 35
Greg Farquhar
28 / 35
Greg Farquhar
(vs DQN-Deep)
depth-1
○ Reward + value ○ Auxiliary loss ○ Parameter sharing
29 / 35
Greg Farquhar
drop-in replacement
TreeQN
critic?
30 / 35
Greg Farquhar
31 / 35
Greg Farquhar
○ Better auxiliary tasks? ○ Pre-training? ○ Different environments?
32 / 35
Greg Farquhar
○ Depth ○ Structure ○ Auxiliary Tasks
○ Need more grounded models to use more refined planning algorithms
33 / 35
Greg Farquhar
end-to-end
34 / 35
Greg Farquhar
35 / 35