de Scale-free adaptive PLANNING for deterministic dynamics & - PowerPoint PPT Presentation

de Scale-free adaptive PLANNING for deterministic dynamics & discounted rewards Peter Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko ICML - June 13th, 2019 1/11

An MCTS setting MDP with starting state x 0 ∈ X , action space A n interactions: At time t playing a t in x t leads to Deterministic dynamics g : x t +1 � g ( x t , a t ), Reward: r t ( x t , a t ) + ε t with ε t being the noise Objective: Recommend action a ( n ) that minimizes r n � max a ∈ A Q ⋆ ( x , a ) − Q ⋆ ( x , a ( n )) simple regret � γ t r ( x t , π ( x t )) where Q ⋆ ( x , a ) � r ( x , a ) + sup π Assumption: r t ∈ [0 , R max ] and | ε t | ≤ b Approach: Trying to explore without the parameters R max and b 2/11

OLOP (Bubeck and Munos, 2010) OLOP implements Optimistic Planning using Upper Confidence Bound (UCB) on the Q value of a sequence of q actions a 1 , . . . , a q : � � � q � + R max γ q +1 1 � Q UCB ( a 1: q ) � γ h � r h ( t ) + γ h b t T a h ( t ) 1 − γ � �� h =1 � �� unseen reward estimation of observed reward in optimization under a fixed budget n , excellent strategies allocate samples to actions without knowing R max or b 3/11

Tree Search x 0 h=0 r 04 r 02 r 03 x 2 x 3 x 4 h=1 r 35 x 5 x 6 x 7 h=2 r 56 x 6 h=3 h=4 Q(x 6 )=r 03 + γ r 35 + γ 2 r 56 h=5 4/11

Tree Search x 0 h=0 r 04 r 02 r 03 x 3 x 2 x 4 h=1 r 35 x 5 x 6 x 7 h=2 r 56 x 6 h=3 h=4 Q(x 6 )=r 03 + γ r 35 + γ 2 r 56 h=5 This is a zero order optimization! 4/11

Black-box optimization: use the partitioning to explore f (uniformly) 5/11

Black-box optimization: use the partitioning to explore f (uniformly) h=0 5/11

Black-box optimization: use the partitioning to explore f (uniformly) h=0 h=1 5/11

Black-box optimization: use the partitioning to explore f (uniformly) h=0 h=1 h=2 5/11

Zipf exploration: Open best n h cells at depth h h=0 h=1 ... ... ... ... n ... h 6/11

Noisy case • need to pull more each x to limit uncertainty • tradeoff: the more you pull each x the shallower you can explore 7/11

Noisy case: StroquOOL (Bartlett et al. 2019) At depth h : • order the cells by decreasing value and • open the i -th best cell with m = n hi estimations 8/11

Black-box optimization vs planning: Reuse of samples and γ Optimization Planning x 0 x 0 h=0 h=0 r 04 r 02 r 03 x 2 x 3 x 4 x 2 x 3 x 4 h=1 h=1 r 35 x 5 x 5 x 6 x 7 x 6 x 7 h=2 h=2 r 56 h=3 h=3 h=4 h=4 h=5 h=5 Lower regret for planning! (Bubeck & Munos 2010) 9/11

Black-box optimization vs planning: Reuse of samples and γ Optimization Planning x 0 x 0 h=0 h=0 r 1 x 2 x 3 x 4 x 2 x 3 x 4 h=1 h=1 r 2 x 5 x 5 x 6 x 7 x 6 x 7 h=2 h=2 r 3 h=3 h=3 r 4 h=4 h=4 f 105 h=5 h=5 Lower regret for planning! (Bubeck & Munos 2010) 9/11

Black-box optimization vs planning: Reuse of samples and γ Optimization Planning x 0 x 0 h=0 h=0 r' 1 x 2 x 3 x 4 x 2 x 3 x 4 h=1 h=1 r' 2 x 5 x 5 x 6 x 7 x 6 x 7 h=2 h=2 r' 3 h=3 h=3 r' 4 h=4 h=4 f 134 h=5 h=5 Lower regret for planning! (Bubeck & Munos 2010) 9/11

Black-box optimization vs planning: Reuse of samples and γ Optimization Planning x 0 x 0 h=0 h=0 r' 1 x 2 x 3 x 4 x 2 x 3 x 4 h=1 h=1 r' 2 x 5 x 5 x 6 x 7 x 6 x 7 h=2 h=2 r' 3 h=3 h=3 r' 4 h=4 h=4 f 134 h=5 h=5 K H samples near the root How many samples near the root? Lower regret for planning! (Bubeck & Munos 2010) 9/11

Black-box optimization vs. planning: Reuse samples and take advantage of γ Uniform exploration Zipf exploration h=0 h=1 x 0 ... r 04 r 04 r 04 h=0 r 04 r 04 r 04 r 04 r 04 ... r 04 r 04 r 04 r 04 x 3 x 2 x 4 h=1 ... x 5 x 6 x 7 ... n h=2 ... h h=3 h=4 h=5 Bubeck & Munos: Only for uniform strategies . . . We figured the amount the samples needed! 10/11

Black-box optimization vs. planning: Reuse samples and take advantage of γ Uniform exploration Zipf exploration h=0 h=1 x 0 ... r 04 r 04 r 04 h=0 r 04 r 04 r 04 r 04 r 04 ... r 04 r 04 r 04 r 04 x 3 x 2 x 4 h=1 ... x 5 x 6 x 7 ... n h=2 ... h h=3 h=4 h=5 not sharing information Sharing information Bubeck & Munos: Only for uniform strategies . . . We figured the amount the samples needed! 10/11

PlaT γ POOS The power of PlaT γ POOS • implements Zipf exploration for MCTS StroquOOL , • explicitly pulls an action at depth h + 1, γ times less than � γ t r ( x t , π ( x t )) , action at depth h , ( Q ⋆ ( x , a ) = r ( x , a ) + sup π • does not use UCB & no use of R max and b ,) • improves over OLOP with adaptation to low noise and additional unknown smoothness • gets exponential speedups when no noise is present! 11/11

de Scale-free adaptive PLANNING for deterministic dynamics & - PowerPoint PPT Presentation

de Scale-free adaptive PLANNING for deterministic dynamics & discounted rewards Peter Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko ICML - June 13th, 2019 1/11 An MCTS setting MDP with starting state x 0 X , action space A

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Training Deterministic Parsers with Non-Deterministic Oracles by Yoav Goldberg and Joakim

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Deterministic Networking Lab Part Bernhard Frmel Institut fr Technische Informatik

From normal to anomalous deterministic diffusion Part 1: Normal deterministic diffusion Rainer

Outline Scale-Free Networks Networks Scale-Free Networks Original model Original model

Bio-inspired computation: Clock-free, grid-free, scale-free, and symbol-free (FA2386-12-1-4050)

Adaptive Management: Adaptive Management: Science, Management, or What? Science, Management, or

From passivity-based adaptive control to LMI tuned adaptive control or how Alexander Fradkov

Group Sequential and Adaptive Designs Part II: Adaptive Designs May 2, 2015 Cyrus Mehta, Ph.D.

A Framework for Comparing Models for Adaptive Testing Jill-Jnn Vie February 19, 2016 Models

Better 2-round adaptive MPC Ran Canetti, Oxana Poburinnaya TAU and BU BU Adaptive Security of

Council of Graduate Coordinators and Staff (CGCS) Meeting May 5, 2017 CGCS Agenda-May 5, 2017

Thank you Sponsors & Exhibitors Sponsored by Exhibitors Corporate Sponsors Agenda

A Comparison of Two Parallel Ranking and Selection Procedures Eric C. Ni Shane G. Henderson

The Probability Distribution of the Astrophysical Gravitational-Wave Background YONADAV BARRY

e trapianto nelle sindromi mielodisplastiche e nelle leucemie mieloidi acute Francesco Onida

MORE VISCOUS high-SiO 2 (Quartz) cold LESS VISCOUS low-SiO 2 (Olivine) hot Image:

Basalt River Parcel Reuse Alternatives April 14, 2015 Bruce Kimmel and Paul Wisor Fix the Fork

Upgrading Pattern Lab to Twig 2.0 for D9 About Mediacurrent Who We Are Our people our impelled

de Scale-free adaptive PLANNING for deterministic dynamics & - PowerPoint PPT Presentation

de Scale-free adaptive PLANNING for deterministic dynamics & discounted rewards Peter Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko ICML - June 13th, 2019 1/11 An MCTS setting MDP with starting state x 0 X , action space A

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Training Deterministic Parsers with Non-Deterministic Oracles by Yoav Goldberg and Joakim

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Deterministic Networking Lab Part Bernhard Frmel Institut fr Technische Informatik

From normal to anomalous deterministic diffusion Part 1: Normal deterministic diffusion Rainer

Outline Scale-Free Networks Networks Scale-Free Networks Original model Original model

Bio-inspired computation: Clock-free, grid-free, scale-free, and symbol-free (FA2386-12-1-4050)

Adaptive Management: Adaptive Management: Science, Management, or What? Science, Management, or

From passivity-based adaptive control to LMI tuned adaptive control or how Alexander Fradkov

Group Sequential and Adaptive Designs Part II: Adaptive Designs May 2, 2015 Cyrus Mehta, Ph.D.

A Framework for Comparing Models for Adaptive Testing Jill-Jnn Vie February 19, 2016 Models

Better 2-round adaptive MPC Ran Canetti, Oxana Poburinnaya TAU and BU BU Adaptive Security of

Council of Graduate Coordinators and Staff (CGCS) Meeting May 5, 2017 CGCS Agenda-May 5, 2017

Thank you Sponsors &amp; Exhibitors Sponsored by Exhibitors Corporate Sponsors Agenda

A Comparison of Two Parallel Ranking and Selection Procedures Eric C. Ni Shane G. Henderson

The Probability Distribution of the Astrophysical Gravitational-Wave Background YONADAV BARRY

e trapianto nelle sindromi mielodisplastiche e nelle leucemie mieloidi acute Francesco Onida

MORE VISCOUS high-SiO 2 (Quartz) cold LESS VISCOUS low-SiO 2 (Olivine) hot Image:

Basalt River Parcel Reuse Alternatives April 14, 2015 Bruce Kimmel and Paul Wisor Fix the Fork

Upgrading Pattern Lab to Twig 2.0 for D9 About Mediacurrent Who We Are Our people our impelled

Thank you Sponsors & Exhibitors Sponsored by Exhibitors Corporate Sponsors Agenda