A near model-free method for solving the Hamilton-Jacobi-Bellman - PowerPoint PPT Presentation

A near model-free method for solving the Hamilton-Jacobi-Bellman equation in high dimensions Mathias Oster , Leon Sallandt, Reinhold Schneider Technische Universit¨ at Berlin ICODE Workshop on numerical solutions of HJB equations 10.01.2020

Motivation and Ingredients Aim: Calculate optimal feedback laws (via HJB) for controlled PDEs. Ingredients: 1 Reformulate the HJB equation as operator equation. 2 Use Monte Carlo integration for least squares approximation. 3 Use non linear, smooth Ansatz space: HT/TT – tree-based tensors. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 2 / 21

Classical optimal control problem Optimal control problem: find u ∈ L 2 (0 , ∞ ) such that � ∞ 1 R n + λ 2 � x ( s ) � 2 2 | u ( s ) | 2 ds , min u J ( x , u ) = min u 0 subject to x = f ( x , u ) , ˙ x ∈ Ω ⊂ R x (0) = x 0 1 Note that the differential equation can be high-dimensional 2 linear ODE and quadratic cost → Riccati equation 3 nonlinear ODE and nonlinear cost → Hamilton-Jacobi-Bellman (HJB) equation Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 3 / 21

Feedback control problem Define a feedback-law α ( x ( t )) = u ( t ). Rephrase � ∞ 1 R n + λ 2 � x ( s , α ) � 2 2 | ( α ( x ))( s ) | 2 α J α ( x ) = min min ds , α 0 � �� =: r α ( x ) Our goal: find an optimal feedback law α ∗ ( x ) = u . Defining the value function α J α ( x ) ∈ R v ( x ) := inf Idea: if v is differentiable, the feedback law is given by α ( x ) = − 1 λ D x v ( x ) ◦ D u f ( x , u ) (easy to calculate!) . Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 4 / 21

The HJB equation The value function obeys � � f ( x , α ( x )) · ∇ v ( x ) + r α ( x ) inf = 0 α HJB equation is highly nonlinear and potentially high-dimensional! But: For fixed policy α ( x ) it reduces to a linear equation: Defining L α := − f ( x , α ) · ∇ we get L α v α ( x ) − r α ( x ) = 0 . Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 5 / 21

Methods of characteristics Linearized HJB: L α v α ( x ) − r α ( x ) = 0 . Using the methods of characteristics we obtain x ( t ) = f ( x , α ) , ˙ � τ v α ( x (0)) = r α ( x ( t )) dt + v α ( x ( s )) , 0 which we call Bellman-like equation. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 6 / 21

Reformulation as Operator Equation Consider the Koopman operator: K α K α τ : L loc , ∞ (Ω) → L loc , ∞ (Ω) , τ [ g ]( x ) = g ( x ( τ )) . Rewrite the Bellman-like equation: For all x ∈ Ω: � τ v α ( x (0)) = r α ( x ( t )) dt + v α ( x ( s )) , 0 as � τ (Id − K α K α τ )[ v ]( x ) = t r ( x ) dt . 0 � �� =: R α τ ( x ) Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 7 / 21

Policy iteration Policy iteration uses a sequence of linearized HJB equations. Algorithm (Policy iteration) Initialize with stabilizing feedback α 0 . Solve until convergence 1 Find v i +1 such that ( Id − K α i τ ) v i +1 ( · ) − R α i τ ( · ) = 0 . 2 Update policy according to α i +1 ( x ) = − 1 λ D x v i +1 ( x ) ◦ D u f ( t , x , u ) . Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 8 / 21

Least squares ansatz Problem: We need to solve (Id − K α i τ ) v α i +1 ( · ) − R α i τ ( · ) = 0 . Idea : Solve on suitable S τ ( · ) � 2 � (Id − K α i τ ) v ( · ) − R α i v α i +1 = arg min . L 2 (Ω) v ∈ S � �� Ω | ( Id − K α i α i � τ ( x ) | 2 dx = τ ) v ( x ) − R Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 9 / 21

Projected Policy iteration Algorithm (Projected Policy iteration) Initialize with stabilizing feedback α 0 . Solve until convergence 1 Find τ ( · ) � 2 � ( Id − K α i τ ) v ( · ) − R α i v i +1 = arg min L 2 (Ω) . v ∈ S 2 Update policy according to α i +1 ( x ) = − 1 λ D x v i +1 ( x ) ◦ D u f ( x , u ) . Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 10 / 21

Variational Monte-Carlo Approximate by Monte-Carlo quadrature n L 2 (Ω) ≈ 1 � � (Id − K α i τ ) v ( · ) − R α i τ ( · ) � 2 | (Id − K α i τ ) v ( x j ) − R α i τ ( x j ) | 2 . n j =1 n 1 � v ∗ | (Id − K α i τ ) v ( x j ) − R α i τ ( x j ) | 2 n , s = arg min n v ∈ S j =1 Proposition ([Eigel, Schneider et al, 19) v s ∈ S � v ∗ − v s � 2 ] Let ǫ > 0 such that inf L 2 (Ω) ≤ ǫ . Then P [ � v ∗ − v ∗ ( n , s ) � 2 L 2 (Ω) > ǫ ] ≤ c 1 ( ǫ ) e − c 2 ( ǫ ) n with c 1 , c 2 > 0 . Exponential decay with number of samples chosen. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 11 / 21

Solving the VMC equation n � | (Id − K α i τ ) v ( x j ) − R α i τ ( x j ) | 2 . arg min j =1 1 v ( x j ) → evaluate v at samples x j . 2 K α i τ v ( x j ) → evaluate v at transported samples (with policy α i ). 3 R α i τ ( x j ) → approximate reward by trapezoidal rule What do we need for solving the equation? Model-free solution is possible. Only a black-box solver of the ODE is needed. What do we need for updating the policy? We need D u f ( x , u ), i.e. The derivative of the rhs w.r.t. the control. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 12 / 21

Possible ansatz spaces Full linear space of polynomials Low-rank tensor manifolds Deep Neural Networks Here used: Low rank Tensor Train (TT-tensor) manifold Riemanian manifold structure Explicit representation of tangential space Convergence theory for optimization algorithms Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 13 / 21

Tensor Trains Consider Π i = (1 , x i , x 2 i , x 3 i , .., x k i ) one-dimensional polynomials. Tensor product Π = � n i =1 Π i . dim(Π) = ( k + 1) n , huge if n > > 0. Reduce size of Ansatz space by considering non-linear M ⊂ Π. r 1 r 2 r 3 A 1 A 2 A 3 A 4 v ( x ) = P P P P x 1 x 2 x 3 x 4 Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 14 / 21

Cost functional Modify cost-functional: n R N ( v ) = 1 � | (Id − K α i τ ) v ( x j ) − R ( x j ) | 2 n j =1 + | v (0) | 2 + |∇ v (0) | 2 + µ � v � 2 H 1 (Ω) � �� vanishes in exact case regularizer Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 15 / 21

Example: Schloegl-like equation Consider a Schl¨ ogl like system with Neumann boundary condition, c.f. [1, Dolgov, Kalise, Kunisch, 19]. Solve for x ∈ Ω = L 2 ( − 1 , 1) � ∞ 1 2 � x ( s ) � 2 + λ 2 | u ( s ) | 2 ds , min u J ( x , u ) = min u 0 subject to x ( t ) = σ ∆ x ( t ) + x ( t ) 3 + χ ω u ( t ) ˙ x (0) = x 0 . χ ω is characteristic function on ω = [ − 0 . 4 , 0 , 4]. After discretization in space (finite differences):   0 .       x 3  .  x 1 ˙ x 1 . 1   . . .         . . .  = A  +  + u 1   . . .      . x 3  .  x n ˙ x n . n   0 Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 16 / 21

Example: Schloegl-like equation TT Degrees of Freedom Full space: 5 32 . Reduced to ≈ 5000. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 17 / 21

Example: Schloegl-like equation 5.74 6 2 . 00 x 0 Cost 4 x 1 1 . 75 2.85 2.88 2.83 2.15 2.12 2 1 . 50 1 . 25 0 x 0 x 1 1 . 00 Bellman error squared 10 1 0 . 75 10 0 0 . 50 10 − 1 0 . 25 0 . 00 10 − 2 − 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 | v ( x 0 ) − J ( x 0 , α ( x 0 )) | 2 | v ( x 1 ) − J ( x 1 , α ( x 1 )) | 2 (a) Initial values. (b) Generated cost and least squares error. Blue is Riccati, orange is V L 2 and green is V H 1 . Figure: The generated controls for different initial values. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 18 / 21

Example: Schloegl-like equation 0 0 . 0 − 2 . 5 − 5 − 5 . 0 − 7 . 5 − 10 − 10 . 0 − 12 . 5 − 15 Riccati Riccati − 15 . 0 V L 2 V L 2 − 20 V H 1 − 17 . 5 V H 1 0 1 2 3 4 5 0 1 2 3 4 5 time time (a) Generated controls, initial value x 0 (b) Generated controls, initial value x 1 Figure: The generated controls for different initial values. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 19 / 21

What do we need for optimization We only need a discretization of the flow Φ (blackbox) the derivative of the rhs f ( x , u ) w.r.t. the control (easy if linear) the cost functional to solve the equation and generate a feedback law. Thank you for your attention Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 20 / 21

References and related work Sergey Dolgov, Dante Kalise, and Karl Kunisch. A Tensor Decomposition Approach for High-Dimensional Hamilton-Jacobi-Bellman Equations. arXiv e-prints , page arXiv:1908.01533, Aug 2019. Martin Eigel, Reinhold Schneider, Philipp Trunschke, and Sebastian Wolf. Variational monte carlo—bridging concepts of machine learning and high-dimensional partial differential equations. Advances in Computational Mathematics , Oct 2019. Mathias Oster, Leon Sallandt, and Reinhold Schneider. Approximating the stationary hamilton-jacobi-bellman equation by hierarchical tensor products, 2019. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 21 / 21

A near model-free method for solving the Hamilton-Jacobi-Bellman - PowerPoint PPT Presentation

A near model-free method for solving the Hamilton-Jacobi-Bellman equation in high dimensions Mathias Oster , Leon Sallandt, Reinhold Schneider Technische Universit at Berlin ICODE Workshop on numerical solutions of HJB equations 10.01.2020

Abstract The Hamilton- Jacobi partial differential equation is generalized to be applicable for

Hamilton-Jacobi Skeleton and Shock Graphs Peihong Zhu University of Utah Papers:

Parallel solution of large sparse eigenproblems using a Block-Jacobi-Davidson method Melven

The Fast Sweeping Method for Convex Hamilton-Jacobi Equations and Beyond Hongkai Zhao UC Irvine

Solving Hamilton-Jacobi-Bellman equations by combining a max-plus linear approximation and a

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

7.1 Denis Corr, Ph.D. Denis Corr, Ph.D. Chair Clean Air Hamilton www.cleanair.hamilton.ca

HAMILTON BUSINESS DISTRICT HDC DOWNTOWN HAMILTON IMPROVEMENTS HAMILTON DEVELOPMENT CORPORATION

Solving High Dimensional Hamilton- Jacobi-Bellman Equations Using Low Rank Tensor Decomposition

Suboptimal feedback control of PDEs by solving Hamilton-Jacobi Bellman equations on sparse grids

Jacobi Forms of Lattice Index Andreea Mocanu The University of Nottingham 7th of December, 2016

The transverse Jacobi equation Luis Guijarro Universidad Aut onoma de Madrid-ICMAT Symmetry

A Narrow-Stencil Finite Difference Method for Hamilton-Jacobi-Bellman Equations Xiaobing Feng

Newtons method Newtons method 1 / 8 Newtons method Objective: solving a non-linear

The Origin of Near Earth The Origin of Near Earth The Origin of Near Earth The Origin of Near

Singularities of solutions of the Hamilton-Jacobi equation. A toy model: distance to a closed

iRobot Create Behaviors and GUI CSSE 120Rose Hulman Institute of Technology Thought for the

Analysis and Control of Multi-Robot Systems Elements of Passivity Theory Dr. Paolo Robuffo

State Estimation for Continuous-Time Systems with Perspective Outputs from Discrete Noisy

Last Time TODO add: Estimating worst case execution time PID material from Pont slides

Feedback Shaping of the Waterbed Effect and Transient Improvement in Feedforward Control

Dynamic Controllers Simulation x i Newtonian laws gravity ground contact forces x i +1 . . .

Online Feedback Optimization with Applications to Power Systems Florian Drfler ETH Zrich

Robots interacting with Humans: confronting the Critical Challenge of Machine Intelligence