 
              Concepts for Breaking the Curse of Dimensionality for the Optimal Control HJB Equation Karl Kunisch and Daniel Walter University of Graz, and RICAM Linz, Austria RICAM, October 2019
Closed loop optimal control  ∞ �  2 | u ( t ) | 2 dt  ℓ ( y ( t )) + γ u ( · ) ∈ U J ( u ( · ) , x ) := min 0   subject to y ( t ) = f ( y ( t )) + g ( y ( t )) u ( t ) , ˙ y (0) = x ℓ (0) = f (0) = 0 , optimal value function V ( x ) := min u ( · ) ∈ U J ( u ( · ) , x ) Hamilton-Jacobi-Bellman equation u ∈ U { DV ( x )( f ( x ) + g ( x ) u ) + ℓ ( x ) + γ 2 | u | 2 } = 0 , min V (0) = 0 , if U ≡ linear space, u ∗ ( x ) = − 1 γ g ( x ) ∗ DV ( x ) ∗ , then DV ( x ) f ( x ) − 1 2 γ DV ( x ) g ( x ) g ( x ) ∗ DV ( x ) ∗ + ℓ ( x ) = 0 .
Close the loop y ( t ) = f ( y ( t )) − 1 γ g ( y ( t )) g ( y ( t )) ∗ DV ( y ( t )) ∗ , ˙ y (0) = x good properties, but . . .  ∞ �  |D y ( t ) | 2 + γ | R 1 2 u ( t ) | 2 dt  J ( u ( · ) , x ) := 1 min 2 u 0   subject to y ( t ) = Ay ( t ) + Bu ( t ) , ˙ y (0) = x . under stabilizability and detectability assumption: Π A + A ∗ Π − Π BR − 1 B ∗ Π + D ∗ D = 0 u ∗ = − R − 1 B ∗ Π y closed loop system y ( t ) = A Π y ( t ) = ( A − BR − 1 B ∗ Π) y ( t ) , ˙ y (0) = x A ∼ A ( t ) as linearization of f M.Badra, T. Breiten, S.Ervedoza, J.-P. Raymond,. . .
Before we start the analysis IS HJB WORTH THE EFFORT ? and if yes, how to get it ? ◮ solve it directly ◮ solve using tensor calculus (TT-rank) ◮ interpolate it from open loop data ◮ Taylor expansion ◮ Hopf formulas ◮ . . .
Optimal HJB-based feedback stabilization of the Newell-Whitehead equation y t = ν ∆ y + y (1 − y 2 ) + χ ω ( x ) u ( t ) in ( − 1 , 1) × (0 , ∞ ) , y x ( − 1 , t ) = y x (1 , t ) = 0 for t ≥ 0 , y ( x , 0) = y 0 ( x ) in ( − 1 , 1) , Note: 0 is unstable, ± 1 are stable equilibria describes excitable systems such as neurons or axons, relates to Schlögel model, describing Rayleigh-Benard convection. tensor train computations, jointly with S.Dolgov and D. Kalise, up to dimension 100
Newell-Whitehead Equation d = 40 5 10 3 UNC 10 1 J = ∞ 0 HJB 14 HJB 14 10 − 1 J = 1 . 73 LQR HJB 40 J = 40 . 2 J = 1 . 55 HJB 40 − 5 10 − 3 LQR 10 − 5 − 10 0 0 . 5 1 1 . 5 2 2 . 5 3 0 0 . 5 1 1 . 5 2 2 . 5 3 t t states controls
2-D Newell-Whitehead Equation , 121 DoFs 10 4 CPU time, min. 0 max. TT rank √ d = 81 O ( d ) − 1 10 3 − 2 − 3 d = 121 − 4 10 2 60 80 100 120 0 0 . 5 1 1 . 5 2 2 . 5 3 d t
Structure exploiting policy iteration DV ( x ) f ( x ) − 1 2 γ DV ( x ) g ( x ) g ( x ) ∗ DV ( x ) ∗ + ℓ ( x ) = 0 . u ∗ ( x ) = − 1 γ g ( x ) ∗ DV ( x ) ∗ ◮ Solving nonlinear HJB: policy iteration (Howard’s alg.), Newton method, Newton-Kleinman iteration for Riccati equations.
Successive Approximation Algorithm Data: Initialization: tol , stabilizing control u 0 ( x ) while � V n − V n +1 � ≥ tol do 1. Solve for V n +1 ( x ) ( f ( x ) + gu n ) T ∇ V n +1 ( x ) + ℓ ( x ) + 1 2 γ � u n ( x ) � 2 = 0 . 2. Update u n +1 ( x ) = − 1 2 g T ∇ V n +1 ( x ). 3. n = n + 1. end Result: V ∞ ( x ) , u ∞ ( x ) ◮ u 0 ( x ) must be asymptotically stabilizing or ◮ discounting
Two ’infinities’: the dynamical system Meshfree discretization of dynamical system, e.g. pseudo-spectral collocation based on Chebysheff polynomials ◮ The state x ( t ) = ( x 1 ( t ) , . . . , x d ( t )) t ∈ R d . ◮ The free dynamics f ( x ) : R d → R d are C 1 and separable in every coordinate f i ( x ) N f d � � f i ( x ) = F ( i , j , k ) ( x k ) , j =1 k =1 where F ( x ) : R d → R d × N f × d is a tensor-valued function.
Galerkin Approximation of the GHJB Equation ◮ Given u n ( x ), we solve the linear Generalized HJB equation ( f ( x ) + Bu n ) T ∇ V ( x ) + ℓ ( x ) + � u n � 2 = 0 . ◮ With { φ j ( x ) } ∞ j =1 a complete set of d -dimensional polynomial basis functions, we approximate N � V ( x ) ≈ c j φ j ( x ) j =1 ◮ u n ( n > 0) is expressed in the form N � u n ( x ) = − 1 2 γ g T ∇ V n ( x ) = − 1 2 γ g T c n j ∇ φ j ( x ) . j =1 ◮ Every term expanded, leads to dense linear system for V n +1 ( x ) A ( c n ) c n +1 = b ( c n ) .
The Ingredients of Policy Iteration ◮ Meshfree ! eg pseudo-spectral collocation based on Chebysheff polynomials ◮ separability of f ◮ Galerkin approximation of GHJB using globally supported polynomials (monomials, Legendre, . . . ) ◮ high dimensional integrals: introduce separable structure: d � φ i ( . . . M d !) φ j ( x ) = j ( x i ) i =1 ◮ tensorize
Towards neural network based optimal feedback control  � � | Dy ( t ) | 2 + β | u ( t ) | 2 � 1   inf d t ( P y 0 2 y ∈ W ∞ , u ∈ L 2 ( I ; R m ) β ) I   y = f ( y ) + Bu , ˙ y (0) = y 0 , s . t . W ∞ = { y ∈ L 2 ( I ; R n ) | ˙ y ∈ L 2 ( I ; R n ) } , I = (0 , ∞ ) , B ∈ R n × m . our interest: optimal feedback stabilization u ∗ ( t ) = F ∗ ( y ∗ ( t )) = − 1 β B ⊤ ∇ V ( y ∗ ( t )) for all y 0 in a compact set Y 0 ⊂ R n containing 0.
A.1 Df : R n → R n × n is Lip. continuous on compacts, f (0) = 0. A.2 There exists F ∗ : R n → R m : the induced Nemitsky operator satisfies F ∗ : W ∞ → L 2 ( I ; R m ). Further y = f ( y ) + B F ∗ ( y ) , ˙ y (0) = y 0 admits a unique solution y ∗ ( y 0 ) ∈ W ∞ , ∀ y 0 ∈ R n , and ( y ∗ ( y 0 ) , F ∗ ( y ∗ ( y 0 ))) ∈ arg min ( P y 0 ∀ y 0 ∈ R n . β ) A.3 DF ∗ : R n → R n × n is Lip. continuous on compacts, F ∗ (0) = 0. A.4 ∃ M 0 and a bounded neighborhood N ( Y 0 ) ⊂ R n : y ∗ : N ( Y 0 ) → W ∞ , y 0 �→ y ∗ ( y 0 ) is continuously differentiable and � y ∗ ( y 0 ) � W ∞ ≤ M 0 ∀ y 0 ∈ N ( Y 0 ) .
The learning problem  � � | Dy ( t ) | 2 + β |F ( y )( t ) | 2 � J ( y , F ) = 1   min d t  2 F∈H , I ( P y 0 ) y ∈ W ∞    y = f ( y ) + B F ( y ) , ˙ y (0) = y 0 , where � � ¯ B 2 M 0 (0); R m � � H = F ( y )( t ) = F ( y ( t ) : F ∈ Lip , F (0) = 0 . Yes, but . . . . Bellman principle implies learn along: S = { y ( F ∗ ( t )) : t ∈ I } . or better  m �   J ( y ( y i  min 0 ) , F ) F∈H , y ( y i 0 ) ∈ W ∞ i =1    ˙ y ( y i 0 ) = f ( y ( y i 0 )) + B F ( y ( y i y (0) = y i s . t . 0 )) , 0 ,
The learning problem �   min j ( y , F ) = J ( y ( y 0 ) F ( y ( y 0 ))) d µ ( y 0 ) ,     F∈H , Y 0  y ∈ L ∞ µ ( Y 0 ; W ∞ ) ( P )  ˙  y ( y 0 ) = f ( y ( y 0 )) + B F ( y ( y 0 )) , for µ -a.e. y 0 ∈ Y 0 ,     � y � L ∞ µ ( Y 0 ; W ∞ ) ≤ 2 M 0 where ( Y 0 , A , µ ) is a complete probability space. Proposition ( P ) admits a solution and we have equivalence to µ -a.e. solutions of ( P y 0 ) on Y 0 . Corollary Let ( ¯ F , ¯ y ) be an optimal solution to ( P ) and assume that A y ∈ C b (supp µ ; W ∞ ) , then contains the Borel σ -algebra on Y 0 . If ¯ � y ( y 0 )( t ) | y 0 ∈ supp µ, t ∈ [0 , + ∞ ) } ⊂ ¯ Y 0 := { ¯ B 2 M 0 (0) is compact and the previous proposition can be extended to � Y 0 .
Recap on neural networks ∀ x ∈ R N i − 1 , i = 1 , . . . , L − 1 f i ,θ ( x ) = σ ( W i x + b i ) ∀ x ∈ R N L − 1 f L ,θ ( x ) = W L x + b L σ ∈ C 1 ( R , R ) activation function θ = ( W 1 , b 1 , . . . , W L , b L ) � R N i × N i − 1 × R N i � L × R = , i =1 which is uniquely determined by its architecture arch( R ) = ( N 0 , N 1 , . . . , N L ) ∈ N L +1 , f L ,θ ◦ f L − 1 ,θ ◦ · · · ◦ f 1 ,θ ( x ) ∀ x ∈ R n F θ ( x ) = f L ,θ ◦ f L − 1 ,θ ◦ · · · ◦ f 1 ,θ ( x ) − f L ,θ ◦ f L − 1 ,θ ◦ · · · ◦ f 1 ,θ (0)
Recap on neural networks ∀ x ∈ R N i − 1 , i = 1 , . . . , L − 1 f i ,θ ( x ) = σ ( W i x + b i )+ x ∀ x ∈ R N L − 1 f L ,θ ( x ) = W L x + b L σ ∈ C 1 ( R , R ) activation function θ = ( W 1 , b 1 , . . . , W L , b L ) � R N i × N i − 1 × R N i � L × R = , i =1 which is uniquely determined by its architecture arch( R ) = ( N 0 , N 1 , . . . , N L ) ∈ N L +1 , f L ,θ ◦ f L − 1 ,θ ◦ · · · ◦ f 1 ,θ ( x ) ∀ x ∈ R n F θ ( x ) = f L ,θ ◦ f L − 1 ,θ ◦ · · · ◦ f 1 ,θ ( x ) − f L ,θ ◦ f L − 1 ,θ ◦ · · · ◦ f 1 ,θ (0)
Approximation by neural networks Theorem Let η 1 > 0 , η 2 > 0 , and assume that the activation function σ is not a polynomial. Then for each ǫ > 0 there exist L ε ∈ N , arch( R ε ) ∈ N L ε +1 and a neural network θ ε = ( W ε 1 , b ε 1 , . . . , W ε L ε , b ε L ε ) ∈ R ε such that � W ε 1 � ∞ ≤ η 1 , | b ε i | ∞ ≤ η 2 , i = 1 , . . . , L ε , as well as | F ∗ ( x ) − F θ ε ( x ) | + � DF ∗ ( x ) − DF θ ε ( x ) � ≤ ε for all | x | ≤ 2 M 0 . Thus, approximate F by F θ ε !
Recommend
More recommend