Differentiable Tree Planning for Deep RL Greg Farquhar 1 In - PowerPoint PPT Presentation

Differentiable Tree Planning for Deep RL Greg Farquhar 1

In Collaboration With Tim Rocktaschel, Maximilian Igl, & Shimon Whiteson Greg Farquhar 2 / 35

Overview ● Reinforcement learning ● Model-based RL and online planning ● TreeQN and ATreeC (ICLR 2018) ● Results ● Future work Greg Farquhar 3 / 35

Planning and Learning for Control Greg Farquhar 4 / 35

Reinforcement Learning Greg Farquhar 5 / 35

Reinforcement Learning ● Specify the reward, learn the solution ● Very general framework ● Problem is hard: ○ Rewards are sparse ○ Credit assignment ○ Exploration and exploitation ○ Large state/action spaces ○ Approximation and generalisation Greg Farquhar 6 / 35

RL Key Concepts ● State (Observation) ● Action ● Transition ● Reward ● Policy: states → actions Greg Farquhar 7 / 35

Model-free RL: Value Functions ● Learn without a model of the environment ● Value function ● Optimal value function ● Policy evaluation + improvement Greg Farquhar 8 / 35

The Bellman Equation ● Temporal (Markov) structure ● Bellman optimality equation ● Q-learning ● Backups Greg Farquhar 9 / 35

Deep RL ● Q → deep neural network ● Q-learning as regression ● Stability is hard ○ Target networks ○ Replay memory ○ Parallel environment threads Greg Farquhar 10 / 35

Deep RL - Encode and Evaluate? Greg Farquhar 11 / 35

Model-based RL Greg Farquhar 12 / 35

Online Planning with Tree Search Greg Farquhar 13 / 35

Environment Models ● State transition + reward ● Can be hard to learn ○ Complex ○ Generalise poorly to new parts of the state space ● Need very good fidelity for planning ● Standard approach: predictive error on observations Greg Farquhar 14 / 35

Model fidelity in complex visual spaces is too low for effective planning Action-conditional video prediction using deep networks in atari games (Oh et. al 2015) Greg Farquhar 15 / 35

Model fidelity in complex visual spaces is too low for effective planning Action-conditional video prediction using deep networks in atari games (Oh et. al 2015) Greg Farquhar 16 / 35

Another Way to Learn Models ● Optimise the true objective downstream of the model ○ Value prediction ○ Performance on the real task ● Our approach: integrate differentiable model into differentiable planner, learn end to end. Greg Farquhar 17 / 35

TreeQN: Encode Greg Farquhar 18 / 35

TreeQN: Tree Expansion Greg Farquhar 19 / 35

TreeQN: Evaluation Greg Farquhar 20 / 35

TreeQN: Tree Backup Greg Farquhar 21 / 35

TreeQN Greg Farquhar 22 / 35

Architecture Details ● Two-step transition function ● Residual connections a 1 ● State normalisation a 2 a 3 ● Soft backups action shared normalise conditional Greg Farquhar 23 / 35

Training ● Optimise end-to-end with primary RL objective ● Parameter sharing ● N-step Q-learning with parallel environment threads ● Batch thread data together for GPU ● Increase virtual batch size during tree expansion for efficient computation Greg Farquhar 24 / 35

Grounding the Transition Model ● Observations ● Latent states ● Rewards ○ Inside true targets Greg Farquhar 25 / 35

ATreeC ● Use tree architecture for policy ● Linear critic ● Train with policy gradient Greg Farquhar 26 / 35

Results: Grounding ● Grounding weakly (just reward function) works best ● Maybe joint training of auxiliary objectives is wrong Greg Farquhar 27 / 35

Results: Box Pushing ● TreeQN helps! ● Extra depth can help in some situations Greg Farquhar 28 / 35

Results: Atari ● Good performance ● Makes use of depth (vs DQN-Deep) ● Main benefit from depth-1 ○ Reward + value ○ Auxiliary loss ○ Parameter sharing Greg Farquhar 29 / 35

Results: ATreeC ● Works -- easy as a drop-in replacement ● Smaller benefits than TreeQN ● Limited by quality of critic? Greg Farquhar 30 / 35

Just for fun Greg Farquhar 31 / 35

Interpretability ● Sometimes (?) ● Firmly on model-free end of spectrum ● Grounding is an open question ○ Better auxiliary tasks? ○ Pre-training? ○ Different environments? Greg Farquhar 32 / 35

Future Work ● Lessons learnt for model-free RL: ○ Depth ○ Structure ○ Auxiliary Tasks ● Online planning: ○ Need more grounded models to use more refined planning algorithms Greg Farquhar 33 / 35

Summary ● Combining online planning with deep RL is a key challenge ● We can use a differentiable model inside a differentiable planner and train end-to-end ● Tree-structured models can encode a valuable inductive bias ● More work is needed to effectively learn and use grounded models Greg Farquhar 34 / 35

Thank you! Greg Farquhar 35 / 35

Differentiable Tree Planning for Deep RL Greg Farquhar 1 In - PowerPoint PPT Presentation

Differentiable Tree Planning for Deep RL Greg Farquhar 1 In Collaboration With Tim Rocktaschel, Maximilian Igl, & Shimon Whiteson Greg Farquhar 2 / 35 Overview Reinforcement learning Model-based RL and online planning

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Learning with Differentiable Perturbed Optimizers Quentin Berthet Youth in High-dimensions -

Learning with Differentiable Perturbed Optimizers Quentin Berthet Optimization for ML - CIRM -

The Differentiable Curry Martin Abadi, Dan Belov, Gordon Plotkin, Richard Wei, Dimitrios Vytiniotis

Differentiable Rendering for Mesh and Implicit Surface Weikai Chen Tencent America GAMES

Learning to map between ferns with differentiable binary embedding networks Maximilian Blendowski

Relay : a high level differentiable IR Jared Roesch TVMConf December 12th, 2018 1 This

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok

Differentiable Cloth Simulation for Inverse Problems Junbang Liang 1 Content Motivation

Myia: A Differentiable Language for Deep Learning Olivier Breuleux Computer Analyst, MILA Bart

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

PLTree A tree programming language Overview Philosophy: Everything is a tree All data structures

Locality-Preserving Blockchain Implementation Maxime Sierro DEDIS Lab Supervisors : Kelong

Template Rural Health Network Facilitator Name Job Title Date 1 Strategic Planning It is

Keep Durham Beautiful 919.560.4197 DurhamNC.gov Follow Us @ CityofDurhamNC Empowering volunteers

A Faster Algorithm for the Steiner Tree Problem Daniel M olle, Stefan Richter, and Peter

StreamWorks A System for Real-Time Graph Pattern Matching on Network Traffic GEORGE CHIN,

PERIMETER WALL MAINTENANCE & REPAIRS NEIGHBORHOOD UPDATE - HAMILTON COMM. CENTER MAY 15,

An Introduction to Neural Network Rule Extraction Algorithms By Sarah Jackson Can we trust

SO SOUTHWESTERN MED EDIC ICAL DIS ISTRIC ICT STREETSCAPE MASTER PL PLAN A PR PRESCRIP

Differentiable Tree Planning for Deep RL Greg Farquhar 1 In - PowerPoint PPT Presentation

Differentiable Tree Planning for Deep RL Greg Farquhar 1 In Collaboration With Tim Rocktaschel, Maximilian Igl, & Shimon Whiteson Greg Farquhar 2 / 35 Overview Reinforcement learning Model-based RL and online planning

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Learning with Differentiable Perturbed Optimizers Quentin Berthet Youth in High-dimensions -

Learning with Differentiable Perturbed Optimizers Quentin Berthet Optimization for ML - CIRM -

The Differentiable Curry Martin Abadi, Dan Belov, Gordon Plotkin, Richard Wei, Dimitrios Vytiniotis

Differentiable Rendering for Mesh and Implicit Surface Weikai Chen Tencent America GAMES

Learning to map between ferns with differentiable binary embedding networks Maximilian Blendowski

Relay : a high level differentiable IR Jared Roesch TVMConf December 12th, 2018 1 This

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok

Differentiable Cloth Simulation for Inverse Problems Junbang Liang 1 Content Motivation

Myia: A Differentiable Language for Deep Learning Olivier Breuleux Computer Analyst, MILA Bart

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

PLTree A tree programming language Overview Philosophy: Everything is a tree All data structures

Locality-Preserving Blockchain Implementation Maxime Sierro DEDIS Lab Supervisors : Kelong

Template Rural Health Network Facilitator Name Job Title Date 1 Strategic Planning It is

Keep Durham Beautiful 919.560.4197 DurhamNC.gov Follow Us @ CityofDurhamNC Empowering volunteers

A Faster Algorithm for the Steiner Tree Problem Daniel M olle, Stefan Richter, and Peter

StreamWorks A System for Real-Time Graph Pattern Matching on Network Traffic GEORGE CHIN,

PERIMETER WALL MAINTENANCE &amp; REPAIRS NEIGHBORHOOD UPDATE - HAMILTON COMM. CENTER MAY 15,

An Introduction to Neural Network Rule Extraction Algorithms By Sarah Jackson Can we trust

SO SOUTHWESTERN MED EDIC ICAL DIS ISTRIC ICT STREETSCAPE MASTER PL PLAN A PR PRESCRIP

PERIMETER WALL MAINTENANCE & REPAIRS NEIGHBORHOOD UPDATE - HAMILTON COMM. CENTER MAY 15,