solving hamilton jacobi bellman equations by combining a
play

Solving Hamilton-Jacobi-Bellman equations by combining a max-plus - PowerPoint PPT Presentation

Solving Hamilton-Jacobi-Bellman equations by combining a max-plus linear approximation and a probabilistic numerical method Marianne Akian INRIA Saclay - Ile-de-France and CMAP Ecole polytechnique CNRS RICAM Workshop: Numerical methods


  1. Solving Hamilton-Jacobi-Bellman equations by combining a max-plus linear approximation and a probabilistic numerical method Marianne Akian INRIA Saclay - ˆ Ile-de-France and CMAP ´ Ecole polytechnique CNRS RICAM Workshop: Numerical methods for Hamilton-Jacobi equations in optimal control and related fields Linz, November 21-25, 2016 Joint work with Eric Fodjo, see arXiv:1605.02816

  2. A finite horizon diffusion control problem involving “discrete” and “continuum” controls The state ξ s ∈ R d satisfies the stochastic differential equation d ξ s = f µ s ( ξ s , u s ) ds + σ µ s ( ξ s , u s ) dW s , where ( W s ) s ≥ 0 is a d -dimensional Brownian motion, µ := ( µ s ) 0 ≤ s ≤ T , and u := ( u s ) 0 ≤ s ≤ T are admissible control processes, µ s ∈ M a finite set and u s ∈ U ⊂ R p . The problem consists in maximizing the finite horizon discounted payoff ( δ m ≥ 0): �� T � s t δ µτ ( ξ τ , u τ ) d τ ℓ µ s ( ξ s , u s ) ds t e − J ( t , x , µ, u ) := E � T � δ µτ ( ξ τ , u τ ) d τ ψ ( ξ T ) | ξ t = x + e − . t

  3. The Hamilton-Jacobi-Bellman (HJB) equation Define the value function v : [0 , T ] × R d → R as: v ( t , x ) = sup µ, u J ( t , x , µ, u ) . Under suitable assumptions, it is the unique (continuous) viscosity solution of the HJB equation − ∂ v ∂ t − H ( x , v ( t , x ) , Dv ( t , x ) , D 2 v ( t , x )) = 0 , x ∈ R d , t ∈ [0 , T ) , x ∈ R d , v ( T , x ) = ψ ( x ) , satisfying also some growth condition at infinity (in space), where the Hamiltonian H : R d × R × R d × S d → R is given by: m ∈M H m ( x , r , p , Γ) , H ( x , r , p , Γ) := max with � σ m ( x , u ) σ m ( x , u ) T Γ H m ( x , r , p , Γ) := 1 � � max u ∈U 2 tr � + f m ( x , u ) · p − δ m ( x , u ) r + ℓ m ( x , u ) .

  4. Standard grid based discretizations solving HJB equations suffer the curse of dimensionality malediction: for an error of ǫ , the computing time of finite difference or finite element methods is at least in the order of (1 /ǫ ) d / 2 . Some possible curse of dimensionality-free methods: ◮ Idempotent methods introduced by McEneaney (2007) in the deterministic case, and by McEneaney, Kaise and Han (2011) in the stochastic case. ◮ Probabilistic numerical methods based on a backward stochastic differential equation interpretation of the HJB equation, simulations and regressions: ◮ Quantization Bally, Pag` es (2003) for stopping time problems. ◮ Introduction of a new process without control: Bouchard, Touzi (2004) when σ does not depend on control; Cheridito, Soner, Touzi and Victoir (2007) and Fahim, Touzi and Warin (2011) in the fully-nonlinear case. ◮ Control randomization: Kharroubi, Langren´ e, Pham (2013). ◮ Fixed point iterations: Bender, Zhang (2008) for semilinear PDE (which are not HJB equations).

  5. The idempotent method of McEneaney, Kaise and Han ξ m , u the Euler discretization of the process ξ Given m and u , denote by ˆ with time step h : ξ m , u ( t + h ) = ˆ ˆ ξ m , u ( t ) + f m (ˆ ξ m , u ( t ) , u ) h + σ m (ˆ ξ m , u ( t ) , u )( W t + h − W t ) . Define the dynamic programming operators: � �� h ℓ m ( x , u ) + e − h δ m ( x , u ) E φ (ˆ ξ m , u ( t + h )) | ˆ T m ξ m , u ( t ) = x � t , h ( φ )( x ) =sup , u ∈U and m ∈M T m T t , h ( φ )( x ) =max t , h ( φ )( x ) . The HJB equation can be discretized in time by: v h ( t , x ) = T t , h ( v h ( t + h , · ))( x ) , t ∈ T h := { 0 , h , 2 h , . . . , T − h } . Under appropriate assumptions, this scheme converges to the solution of HJB eq. when h goes to zero.

  6. ◮ In the deterministic case ( σ m = 0), T m t , h and T t , h are max-plus linear: v h ( t + h , x ) = max i =1 ,..., N ( λ i + q t + h ( x )) ∀ x ⇒ i i = T t , h ( q t + h v h ( t , x ) = max i =1 ,..., N ( λ i + q t i ( x )) ∀ x with q t ) . i ◮ We only need to compute the effect of the dynamic programming operator on the finite basis q T i , i = 1 , . . . , N , for instance by computing their projection on a fixed basis (see Fleming and McEneaney (2000) and A.,Gaubert,Lakoua (2008)). ◮ However, the q T are difficult to compute in general, or the size of the i basis need to be exponential in d . ◮ If T m t , h ( q ) is a quadratic form when q is a quadratic form, and if it easy to compute (for instance when the H m correspond to linear quadratic problems), and if the q T are quadratic forms, then the q t i are finite i suppremum of quadratic forms easy to compute (see McEneaney (2006)). The number of quadratic forms for v h (0 , · ) is exponential in the number of time step only. ◮ This idea was extended to the stochastic case by McEneaney, Kaise and Han (2011).

  7. Theorem (McEneaney, Kaise and Han (2011)) Assume δ m = 0 , σ m is constant, f m is affine, ℓ m is concave quadratic (with respect to ( x , u ) ), and ψ is the supremum of a finite number of concave quadratic forms. Then, for all t ∈ T h , there exists a set Z t and a map g t : R d × Z t → R such that for all z ∈ Z t , g t ( · , z ) is a concave quadratic form and v h ( t , x ) = sup g t ( x , z ) . z ∈ Z t Moreover, the sets Z t satisfy Z t = M × { ¯ z t + h : W → Z t + h | Borel measurable } , where W = R d is the space of values of the Brownian process. ◮ Here a concave quadratic form is any map R d → R of the form x �→ q ( x , z ) := 1 2 x T Qx + b · x + c , with z = ( Q , b , c ) ∈ Q d = S − d × R d × R ◮ The proof uses the max-plus (infinite) distributivity property.

  8. ◮ In the deterministic case, the sets Z t are finite, and their cardinality is exponential in time: # Z t = M × # Z t + h = · · · = M N t × # Z T with M = # M and N t = ( T − t ) / h . ◮ In the stochastic case, the sets Z t are infinite as soon as t < T . ◮ If the Brownian process is discretized in space, then W can be replaced by the finite subset with fixed cardinality p , and the sets Z t become finite. ◮ Nevertheless, their cardinality increases doubly exponentially in time: pNt − 1 p − 1 × (# Z T ) p Nt where p ≥ 2 # Z t = M × (# Z t + h ) p = · · · = M ( p = 2 for the Bernouilli discretization). ◮ Then, McEneaney, Kaise and Han proposed to apply a pruning method to reduce at each time step t ∈ T h the cardinality of Z t . ◮ In this talk, we shall replace pruning by random sampling. ◮ The idea is to use only quadratic forms that are optimal in the points of a sample of the process.

  9. Consider the case with no continuous control u and no discount factor. ◮ Then ˆ t , h (ˆ ξ m ( t + h ) = S m ξ m ( t ) , W t + h − W t ) with S m t , h ( x , w ) = x + f m ( x ) h + σ m ( x ) w . and T m t , h ( φ )( x ) = h ℓ m ( x ) + E � φ ( S m � t , h ( x , W t + h − W t )) . d × R d × R , ◮ Assume that φ ( x ) = max z ∈ Z t + h q ( x , z ), Z t + h ⊂ Q d = S − 2 x T Qx + b · x + c , z = ( Q , b , c ) ∈ Q d . and q ( x , z ) := 1 ◮ Then, for each x ∈ R d , there exists ¯ z m x : W → Z t + h measurable s.t. φ ( S m � S m z m � t , h ( x , W t + h − W t )) = q t , h ( x , W t + h − W t ) , ¯ x ( W t + h − W t ) . ◮ Moreover, under the previous assumptions on ℓ m , f m and σ m , we have, for all x ′ ∈ R d , h ℓ m ( x ′ ) + q S m t , h ( x ′ , W t + h − W t ) , ¯ z m = q ( x ′ , z m � � �� E x ( W t + h − W t ) x ) for some z m x ∈ Q d , and so T m t , h ( φ )( x ) = q ( x , z m x ′ ∈ R d q ( x , z m x ) = sup x ′ ) .

  10. The sampling algorithm ◮ Let M = # M and choose N = ( N in , N rg ) giving size of samples. ◮ Choose Z T ⊂ Q d such that | ψ ( x ) − max z ∈ Z T q ( x , z ) | ≤ ǫ . Define v h , N ( T , x ) = max z ∈ Z T q ( x , z ), for x ∈ R d . ◮ Construct a sample of ((ˆ ξ m (0)) m ∈M , ( W t + h − W t ) t ∈T h ) of size N in indexed by ω ∈ Ω N in := { 1 , . . . , N in } , and deduce ˆ ξ m ( t , ω ), m ∈ M . ◮ For t = T − h , T − 2 h , . . . , 0 do: 1. For each ω ∈ Ω N in and m ∈ M , denote x = ˆ ξ m ( t , ω ), and construct a i ) of (Ω N in ) 2 , i ∈ Ω N rg . Let subsample of size N rg of elements ( ω i , ω ′ z m ¯ x : W → Z t + h (as above), be computed at the points ( W t + h − W t )( ω ′ i ) only. Consider q ( x ′ , w ) = h ℓ m ( x ′ ) + q � S m z m � ˜ t , h ( x ′ , w ) , ¯ x ( w ) . Approximate z m x such that q ( x ′ , z m x ) = E [˜ q ( x ′ , W t + h − W t )] by doing a q (ˆ ξ m ( t ) , W t + h − W t ) using a (usual) basis of quadratic regression of ˜ forms of ˆ ξ m ( t ), and the sample (ˆ ξ m ( t , ω i ) , ( W t + h − W t )( ω ′ i )), i ∈ Ω N rg . 2. Let Z t be the set of the parameters z m x ∈ Q d of all the quadratic forms obtained in Step 2. Define v h , N ( t , x ) = max z ∈ Z t q ( x , z ).

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend