Integrating control, inference and learning. Is it what robots - PowerPoint PPT Presentation

Integrating control, inference and learning. Is it what robots should be doing? Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London July 18, 2016 Bert Kappen

Optimal control theory Given a current state and a future desired state, what is the best/cheapest/fastest way to get there. Bert Kappen 1/40

Why stochastic optimal control? Bert Kappen 2/40

Why stochastic optimal control? Exploration Learning Bert Kappen 3/40

Optimal control theory Hard problems: - a learning and exploration problem - a stochastic optimal control computation - a representation problem u ( x , t ) Bert Kappen 4/40

The idea: Control, Inference and Learning Linear Bellman equation and path integral solution Express a control computation as an inference computation. Compute optimal control using MC sampling Bert Kappen 5/40

The idea: Control, Inference and Learning Linear Bellman equation and path integral solution Express a control computation as an inference computation. Compute optimal control using MC sampling Importance sampling Accellerate with importance sampling (=a state-feedback controller) Optimal importance sampler is optimal control Bert Kappen 6/40

The idea: Control, Inference and Learning Linear Bellman equation and path integral solution Express a control computation as an inference computation. Compute optimal control using MC sampling Importance sampling Accellerate with importance sampling (=a state-feedback controller) Optimal importance sampler is optimal control Learning Learn the controller from self-generated data Use Cross Entropy method for parametrized controller Bert Kappen 7/40

Outline • Review of path integral control theory – Some results • Importance sampling – Relation between optimal sampling and optimal control • Cross entropy method for adaptive importance sampling (PICE) – A criterion for parametrized control optimization – Learning by gradient descent • Some examples Bert Kappen 8/40

Discrete time optimal control Consider the control of a discrete time deterministic dynamical system: x t + 1 = x t + f ( x t , u t ) , t = 0 , 1 , . . . , T − 1 x t describes the state and u t specifies the control or action at time t . Given x 0 and u 0: T − 1 , we can compute x 1: T . Define a cost for each sequence of controls: T − 1 � C ( x 0 , u 0: T − 1 ) = V ( x t , u t ) t = 0 Find the sequence u 0: T − 1 that minimizes C ( x 0 , u 0: T − 1 ) . Bert Kappen 9/40

Dynamic programming Find the minimal cost path from A to J. J ( J ) 0 = J ( H ) 3 J ( I ) = 4 = J ( F ) min(6 + J ( H ) , 3 + J ( I )) = 7 = J ( B ) min(7 + J ( E ) , 4 + J ( F ) , 2 + J ( G )) = . . . = Minimal cost at time t easily expressable in terms of minimal cost at time t + 1 . Bert Kappen 10/40

Discrete time optimal control Dynamic programming uses concept of optimal cost-to-go J ( t , x ) . One can recursively compute J ( t , x ) from J ( t + 1 , x ) for all x in the following way: J ( t , x t ) min u t ( V ( x t , u t ) + J ( t + 1 , x t + f ( t , x t , u t ))) = J ( T , x ) 0 = J (0 , x ) u 0: T − 1 C ( x , u 0: T − 1 ) min = This is called the Bellman Equation. Computes u t ( x ) for all intermediate t , x . Bert Kappen 11/40

Stochastic optimal control Consider a stochastic dynamical system dX t = f ( X t , u ) dt + dW t E ( dW t , i dW t . j ) = ν i j dt Given X 0 find control function u ( x , t ) that minimizes the expected future cost � T � � C φ ( X T ) + dtV ( X t , u ( X t , t )) E = 0 Expectation is over all trajectories given the control function u ( x , t ) . � � V ( x , u ) + f ( x , u ) ∇ x J ( x , t ) + 1 2 ν ∇ 2 − ∂ t J ( t , x ) min x J ( x , t ) = u with u = u ( x , t ) and boundary condition J ( x , T ) = φ ( x ) . This is HJB equation. Bert Kappen 12/40

Computing the optimal control solution is hard - solve a Bellman Equation, a PDE - scales badly with dimension Efficient solutions exist for - linear dynamical systems with quadratic costs (Gaussians) - deterministic systems (no noise) Bert Kappen 13/40

The idea Uncontrolled dynamics specifies distribution q ( τ | x , t ) over trajectories τ from x , t . � T Cost for trajectory τ is S ( τ | x , t ) = φ ( x T ) + t dsV ( x s , s ) . Find optimal distribution p ( τ | x , t ) that minimizes E p S and is ’close’ to q ( τ | x , t ) . Bert Kappen 14/40

Controlled diffusions p ( τ | x , t ) is parametrised by functions u ( x , t ) : E ( dW 2 dX t f ( X t , t ) dt + g ( X t , t )( u ( X t , t ) dt + dW t ) t ) = dt = � T � � ds 1 2 u ( X s , s ) 2 C ( u | x , t ) E u S ( τ | x , t ) + = t q ( τ | x , t ) corresponds to u = 0 . The Bellman equation becomes a ’Schr¨ odinger’ equation with J ( x , t ) = − log ψ ( x , t ) : � � V − f T ∂ x − 1 2 ∂ 2 ψ ( x , T ) = e − φ ( x ) ∂ t ψ = ψ x Bert Kappen 16/40

Controlled diffusions The Schr¨ odinger’ equation can be solved formally as a Feynman-Kac path integral: � � e − S � d τ q ( τ | x , t ) e − S ( τ | x , t ) = E q ψ ( x , t ) = Optimal control � dWe − S � E q u ∗ ( x , t ) dt E p ∗ ( dW t ) = = � e − S � E q ψ, u ∗ can be computed by forward sampling from q . Bert Kappen 17/40

Delayed choice Time-to-go T = 2 − t . 1.8 3 T=2 1.6 2 T=1 1.4 1 T=0.5 1.2 J(x,t) 0 1 0.8 −1 0.6 −2 0.4 −2 −1 0 1 2 −3 x 0 0.5 1 1.5 2 J ( x , t ) = − ν log E q exp( − φ ( X 2 ) /ν ) Decision is made at T = 1 ν Bert Kappen 18/40

Acrobot 4 20 8 mean std 2 6 increment 10 ss 0 4 0 −2 2 −4 −10 0 0 50 100 0 50 100 0 50 100 Ju 50 20 15000 Jphi 15 0 10000 10 −50 5000 5 −100 0 0 0 50 100 0 50 100 0 50 100 4 30 2 20 x 2 0 10 u −2 0 −4 −10 −4 −2 0 2 0 10 20 x 1 t 100 iterations. At each iteration 50 trajectories were generated. Noise was lowered at each iteration. Top left: final height for each trajectory. Bert Kappen 19/40

Acrobot (movie92.mp4) Result after 100 iterations, 50 samples per iteration. Bert Kappen 20/40

Robotics ≈ 100 . 00 trajectories per iteration, 3 iterations per second. Video at: http://www.snn.ru.nl/˜bertk/control_theory/PI_quadrotors.mp4 Theodorou et al. 2011 ICRA Gomez et al. 2016 ICAPS Bert Kappen 21/40

Importance sampling and control 10 10 5 5 0 0 −5 −5 −10 −10 0 0.5 1 1.5 2 0 0.5 1 1.5 2 � T ψ ( x , t ) = E q e − S S ( τ | x , t ) = φ ( x T ) + dsV ( x s , s ) t Sampling is ’correct’ but inefficient. Bert Kappen 22/40

Importance sampling 1.2 1 0.8 0.6 0.4 0.2 0 −2 0 2 4 Consider simple 1-d sampling problem. Given q ( x ) , compute � ∞ a = Prob( x < 0) = I ( x ) q ( x ) dx −∞ with I ( x ) = 0 , 1 if x > 0 , x < 0 , respectively. Naive method: generate N samples X i ∼ q N a = 1 � ˆ I ( X i ) N i = 1 Bert Kappen 23/40

Importance sampling 1.2 1 0.8 0.6 0.4 0.2 0 −2 0 2 4 Consider another distribution p ( x ) . Then � ∞ I ( x ) q ( x ) a = Prob( x < 0) = p ( x ) p ( x ) dx −∞ Importance sampling: generate N samples X i ∼ p N a = 1 I ( X i ) q ( X i ) � ˆ N p ( X i ) i = 1 Unbiased (= correct) for any p ! Bert Kappen 24/40

Optimal importance sampling 1.2 1 0.8 0.6 0.4 0.2 0 −2 0 2 4 The distribution p ∗ ( x ) = q ( x ) I ( x ) a is the optimal importance sampler. One sample X ∼ p ∗ is sufficient to estimate a : a = I ( X ) q ( X ) ˆ p ∗ ( X ) = a Bert Kappen 25/40

Importance sampling and control In the case of control we must compute � dWe − S � E q J ( x , t ) = − log E q e − S u ∗ ( x , t ) = � e − S � E q Instead of samples from uncontrolled dynamics q ( u = 0 ), we sample with p ( u � 0 ). E q e − S E p e − S u = � T � T e − S dq 1 2 u ( x , t ) 2 dt − e − S u dp = e − S − u ( x , t ) dW t = t t We can choose any p , ie. any sampling control u . Bert Kappen 26/40

Relation between optimal sampling and optimal control Draw N trajectories τ i , i = 1 , . . . , N from p ( τ | x , t ) using control function u and define e − S u ( τ i | x , t )) α i = � N j = 1 e − S u ( τ j | x , t ) 1 ES S (1 ≤ ES S ≤ N ) = � N j = 1 α 2 j Thm: 1. Better u (in the sense of optimal control) provides a better sampler (in the sense of effective sample size). 2. Optimal u = u ∗ (in the sense of optimal control) requires only one sample, α i = 1 / N and S u ( τ | x , t ) deterministic! � T � T dt 1 2 u ( x s , s ) T ν − 1 u ( x s , s ) + u ( x x , s ) T ν − 1 dW s S u ( τ | x , t ) S ( τ | x , t ) + = t t Bert Kappen 27/40

Integrating control, inference and learning. Is it what robots - PowerPoint PPT Presentation

Integrating control, inference and learning. Is it what robots should be doing? Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London July 18, 2016 Bert Kappen Optimal control theory Given a current state

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

SURFnet6 SURFnet6 SURFnet6 Integrating the IP and Optical worlds Integrating the IP and Optical

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

Integrating LiDAR data into the Integrating LiDAR data into the workflow of cartographic workflow

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

Enhancing and Integrating Whole Enhancing and Integrating Whole Student Learning and Engagement

On the relation between certain stochastic control problems and probabilistic inference Manfred

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

Integrating decision-theoretic planning and programming for robot control in highly dynamic

Identifying Beneficial Insects & Integrating Biological Control Practices Alana Respondek

Industrial Robots Industrial Robots Control Control Part 2 Control Control Part 2 Part 2

Optimal Control of Hybrid Systems with Regional Dynamics Presentation at the Measurement and

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

CAMI Retreat 2012 Todd F. Dupont Applications of Optimal Control of Systems Governed by Partial

Max-plus algebra in the numerical solution of Hamilton-Jacobi and Isaacs equations Marianne Akian

Optimal control of parabolic equations using spectral calculus Ivica Naki Faculty of Science,

Contrle optimal de trajectoires locomotrices humaines Quang-Cuong Pham 21 janvier 2010

Optimal control of AllenCahn equations with singular potentials and dynamic boundary

Optimal investment on finite horizon with random discrete order flow in illiquid markets Mihai

Sambuz

Useful Links

Newsletter

Mail Us

Integrating control, inference and learning. Is it what robots - PowerPoint PPT Presentation

Integrating control, inference and learning. Is it what robots should be doing? Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London July 18, 2016 Bert Kappen Optimal control theory Given a current state

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

SURFnet6 SURFnet6 SURFnet6 Integrating the IP and Optical worlds Integrating the IP and Optical

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

Integrating LiDAR data into the Integrating LiDAR data into the workflow of cartographic workflow

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

Enhancing and Integrating Whole Enhancing and Integrating Whole Student Learning and Engagement

On the relation between certain stochastic control problems and probabilistic inference Manfred

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

Integrating decision-theoretic planning and programming for robot control in highly dynamic

Identifying Beneficial Insects &amp; Integrating Biological Control Practices Alana Respondek

Industrial Robots Industrial Robots Control Control Part 2 Control Control Part 2 Part 2

Optimal Control of Hybrid Systems with Regional Dynamics Presentation at the Measurement and

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

CAMI Retreat 2012 Todd F. Dupont Applications of Optimal Control of Systems Governed by Partial

Max-plus algebra in the numerical solution of Hamilton-Jacobi and Isaacs equations Marianne Akian

Optimal control of parabolic equations using spectral calculus Ivica Naki Faculty of Science,

Contrle optimal de trajectoires locomotrices humaines Quang-Cuong Pham 21 janvier 2010

Optimal control of AllenCahn equations with singular potentials and dynamic boundary

Optimal investment on finite horizon with random discrete order flow in illiquid markets Mihai

Sambuz

Useful Links

Newsletter

Mail Us

Identifying Beneficial Insects & Integrating Biological Control Practices Alana Respondek