a near model free method for solving the hamilton jacobi
play

A near model-free method for solving the Hamilton-Jacobi-Bellman - PowerPoint PPT Presentation

A near model-free method for solving the Hamilton-Jacobi-Bellman equation in high dimensions Mathias Oster , Leon Sallandt, Reinhold Schneider Technische Universit at Berlin ICODE Workshop on numerical solutions of HJB equations 10.01.2020


  1. A near model-free method for solving the Hamilton-Jacobi-Bellman equation in high dimensions Mathias Oster , Leon Sallandt, Reinhold Schneider Technische Universit¨ at Berlin ICODE Workshop on numerical solutions of HJB equations 10.01.2020

  2. Motivation and Ingredients Aim: Calculate optimal feedback laws (via HJB) for controlled PDEs. Ingredients: 1 Reformulate the HJB equation as operator equation. 2 Use Monte Carlo integration for least squares approximation. 3 Use non linear, smooth Ansatz space: HT/TT – tree-based tensors. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 2 / 21

  3. Classical optimal control problem Optimal control problem: find u ∈ L 2 (0 , ∞ ) such that � ∞ 1 R n + λ 2 � x ( s ) � 2 2 | u ( s ) | 2 ds , min u J ( x , u ) = min u 0 subject to x = f ( x , u ) , ˙ x ∈ Ω ⊂ R x (0) = x 0 1 Note that the differential equation can be high-dimensional 2 linear ODE and quadratic cost → Riccati equation 3 nonlinear ODE and nonlinear cost → Hamilton-Jacobi-Bellman (HJB) equation Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 3 / 21

  4. Feedback control problem Define a feedback-law α ( x ( t )) = u ( t ). Rephrase � ∞ 1 R n + λ 2 � x ( s , α ) � 2 2 | ( α ( x ))( s ) | 2 α J α ( x ) = min min ds , α 0 � �� � =: r α ( x ) Our goal: find an optimal feedback law α ∗ ( x ) = u . Defining the value function α J α ( x ) ∈ R v ( x ) := inf Idea: if v is differentiable, the feedback law is given by α ( x ) = − 1 λ D x v ( x ) ◦ D u f ( x , u ) (easy to calculate!) . Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 4 / 21

  5. The HJB equation The value function obeys � � f ( x , α ( x )) · ∇ v ( x ) + r α ( x ) inf = 0 α HJB equation is highly nonlinear and potentially high-dimensional! But: For fixed policy α ( x ) it reduces to a linear equation: Defining L α := − f ( x , α ) · ∇ we get L α v α ( x ) − r α ( x ) = 0 . Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 5 / 21

  6. Methods of characteristics Linearized HJB: L α v α ( x ) − r α ( x ) = 0 . Using the methods of characteristics we obtain x ( t ) = f ( x , α ) , ˙ � τ v α ( x (0)) = r α ( x ( t )) dt + v α ( x ( s )) , 0 which we call Bellman-like equation. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 6 / 21

  7. Reformulation as Operator Equation Consider the Koopman operator: K α K α τ : L loc , ∞ (Ω) → L loc , ∞ (Ω) , τ [ g ]( x ) = g ( x ( τ )) . Rewrite the Bellman-like equation: For all x ∈ Ω: � τ v α ( x (0)) = r α ( x ( t )) dt + v α ( x ( s )) , 0 as � τ (Id − K α K α τ )[ v ]( x ) = t r ( x ) dt . 0 � �� � =: R α τ ( x ) Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 7 / 21

  8. Policy iteration Policy iteration uses a sequence of linearized HJB equations. Algorithm (Policy iteration) Initialize with stabilizing feedback α 0 . Solve until convergence 1 Find v i +1 such that ( Id − K α i τ ) v i +1 ( · ) − R α i τ ( · ) = 0 . 2 Update policy according to α i +1 ( x ) = − 1 λ D x v i +1 ( x ) ◦ D u f ( t , x , u ) . Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 8 / 21

  9. Least squares ansatz Problem: We need to solve (Id − K α i τ ) v α i +1 ( · ) − R α i τ ( · ) = 0 . Idea : Solve on suitable S τ ( · ) � 2 � (Id − K α i τ ) v ( · ) − R α i v α i +1 = arg min . L 2 (Ω) v ∈ S � �� � Ω | ( Id − K α i α i � τ ( x ) | 2 dx = τ ) v ( x ) − R Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 9 / 21

  10. Projected Policy iteration Algorithm (Projected Policy iteration) Initialize with stabilizing feedback α 0 . Solve until convergence 1 Find τ ( · ) � 2 � ( Id − K α i τ ) v ( · ) − R α i v i +1 = arg min L 2 (Ω) . v ∈ S 2 Update policy according to α i +1 ( x ) = − 1 λ D x v i +1 ( x ) ◦ D u f ( x , u ) . Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 10 / 21

  11. Variational Monte-Carlo Approximate by Monte-Carlo quadrature n L 2 (Ω) ≈ 1 � � (Id − K α i τ ) v ( · ) − R α i τ ( · ) � 2 | (Id − K α i τ ) v ( x j ) − R α i τ ( x j ) | 2 . n j =1 n 1 � v ∗ | (Id − K α i τ ) v ( x j ) − R α i τ ( x j ) | 2 n , s = arg min n v ∈ S j =1 Proposition ([Eigel, Schneider et al, 19) v s ∈ S � v ∗ − v s � 2 ] Let ǫ > 0 such that inf L 2 (Ω) ≤ ǫ . Then P [ � v ∗ − v ∗ ( n , s ) � 2 L 2 (Ω) > ǫ ] ≤ c 1 ( ǫ ) e − c 2 ( ǫ ) n with c 1 , c 2 > 0 . Exponential decay with number of samples chosen. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 11 / 21

  12. Solving the VMC equation n � | (Id − K α i τ ) v ( x j ) − R α i τ ( x j ) | 2 . arg min j =1 1 v ( x j ) → evaluate v at samples x j . 2 K α i τ v ( x j ) → evaluate v at transported samples (with policy α i ). 3 R α i τ ( x j ) → approximate reward by trapezoidal rule What do we need for solving the equation? Model-free solution is possible. Only a black-box solver of the ODE is needed. What do we need for updating the policy? We need D u f ( x , u ), i.e. The derivative of the rhs w.r.t. the control. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 12 / 21

  13. Possible ansatz spaces Full linear space of polynomials Low-rank tensor manifolds Deep Neural Networks Here used: Low rank Tensor Train (TT-tensor) manifold Riemanian manifold structure Explicit representation of tangential space Convergence theory for optimization algorithms Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 13 / 21

  14. Tensor Trains Consider Π i = (1 , x i , x 2 i , x 3 i , .., x k i ) one-dimensional polynomials. Tensor product Π = � n i =1 Π i . dim(Π) = ( k + 1) n , huge if n > > 0. Reduce size of Ansatz space by considering non-linear M ⊂ Π. r 1 r 2 r 3 A 1 A 2 A 3 A 4 v ( x ) = P P P P x 1 x 2 x 3 x 4 Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 14 / 21

  15. Cost functional Modify cost-functional: n R N ( v ) = 1 � | (Id − K α i τ ) v ( x j ) − R ( x j ) | 2 n j =1 + | v (0) | 2 + |∇ v (0) | 2 + µ � v � 2 H 1 (Ω) � �� � � �� � vanishes in exact case regularizer Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 15 / 21

  16. Example: Schloegl-like equation Consider a Schl¨ ogl like system with Neumann boundary condition, c.f. [1, Dolgov, Kalise, Kunisch, 19]. Solve for x ∈ Ω = L 2 ( − 1 , 1) � ∞ 1 2 � x ( s ) � 2 + λ 2 | u ( s ) | 2 ds , min u J ( x , u ) = min u 0 subject to x ( t ) = σ ∆ x ( t ) + x ( t ) 3 + χ ω u ( t ) ˙ x (0) = x 0 . χ ω is characteristic function on ω = [ − 0 . 4 , 0 , 4]. After discretization in space (finite differences):   0 .       x 3  .  x 1 ˙ x 1 . 1   . . .         . . .  = A  +  + u 1   . . .      . x 3  .  x n ˙ x n . n   0 Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 16 / 21

  17. Example: Schloegl-like equation TT Degrees of Freedom Full space: 5 32 . Reduced to ≈ 5000. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 17 / 21

  18. Example: Schloegl-like equation 5.74 6 2 . 00 x 0 Cost 4 x 1 1 . 75 2.85 2.88 2.83 2.15 2.12 2 1 . 50 1 . 25 0 x 0 x 1 1 . 00 Bellman error squared 10 1 0 . 75 10 0 0 . 50 10 − 1 0 . 25 0 . 00 10 − 2 − 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 | v ( x 0 ) − J ( x 0 , α ( x 0 )) | 2 | v ( x 1 ) − J ( x 1 , α ( x 1 )) | 2 (a) Initial values. (b) Generated cost and least squares error. Blue is Riccati, orange is V L 2 and green is V H 1 . Figure: The generated controls for different initial values. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 18 / 21

  19. Example: Schloegl-like equation 0 0 . 0 − 2 . 5 − 5 − 5 . 0 − 7 . 5 − 10 − 10 . 0 − 12 . 5 − 15 Riccati Riccati − 15 . 0 V L 2 V L 2 − 20 V H 1 − 17 . 5 V H 1 0 1 2 3 4 5 0 1 2 3 4 5 time time (a) Generated controls, initial value x 0 (b) Generated controls, initial value x 1 Figure: The generated controls for different initial values. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 19 / 21

  20. What do we need for optimization We only need a discretization of the flow Φ (blackbox) the derivative of the rhs f ( x , u ) w.r.t. the control (easy if linear) the cost functional to solve the equation and generate a feedback law. Thank you for your attention Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 20 / 21

  21. References and related work Sergey Dolgov, Dante Kalise, and Karl Kunisch. A Tensor Decomposition Approach for High-Dimensional Hamilton-Jacobi-Bellman Equations. arXiv e-prints , page arXiv:1908.01533, Aug 2019. Martin Eigel, Reinhold Schneider, Philipp Trunschke, and Sebastian Wolf. Variational monte carlo—bridging concepts of machine learning and high-dimensional partial differential equations. Advances in Computational Mathematics , Oct 2019. Mathias Oster, Leon Sallandt, and Reinhold Schneider. Approximating the stationary hamilton-jacobi-bellman equation by hierarchical tensor products, 2019. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 21 / 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend