 
              Solving Hamilton-Jacobi-Bellman equations by combining a max-plus linear approximation and a probabilistic numerical method Marianne Akian INRIA Saclay - ˆ Ile-de-France and CMAP ´ Ecole polytechnique CNRS RICAM Workshop: Numerical methods for Hamilton-Jacobi equations in optimal control and related fields Linz, November 21-25, 2016 Joint work with Eric Fodjo, see arXiv:1605.02816
A finite horizon diffusion control problem involving “discrete” and “continuum” controls The state ξ s ∈ R d satisfies the stochastic differential equation d ξ s = f µ s ( ξ s , u s ) ds + σ µ s ( ξ s , u s ) dW s , where ( W s ) s ≥ 0 is a d -dimensional Brownian motion, µ := ( µ s ) 0 ≤ s ≤ T , and u := ( u s ) 0 ≤ s ≤ T are admissible control processes, µ s ∈ M a finite set and u s ∈ U ⊂ R p . The problem consists in maximizing the finite horizon discounted payoff ( δ m ≥ 0): �� T � s t δ µτ ( ξ τ , u τ ) d τ ℓ µ s ( ξ s , u s ) ds t e − J ( t , x , µ, u ) := E � T � δ µτ ( ξ τ , u τ ) d τ ψ ( ξ T ) | ξ t = x + e − . t
The Hamilton-Jacobi-Bellman (HJB) equation Define the value function v : [0 , T ] × R d → R as: v ( t , x ) = sup µ, u J ( t , x , µ, u ) . Under suitable assumptions, it is the unique (continuous) viscosity solution of the HJB equation − ∂ v ∂ t − H ( x , v ( t , x ) , Dv ( t , x ) , D 2 v ( t , x )) = 0 , x ∈ R d , t ∈ [0 , T ) , x ∈ R d , v ( T , x ) = ψ ( x ) , satisfying also some growth condition at infinity (in space), where the Hamiltonian H : R d × R × R d × S d → R is given by: m ∈M H m ( x , r , p , Γ) , H ( x , r , p , Γ) := max with � σ m ( x , u ) σ m ( x , u ) T Γ H m ( x , r , p , Γ) := 1 � � max u ∈U 2 tr � + f m ( x , u ) · p − δ m ( x , u ) r + ℓ m ( x , u ) .
Standard grid based discretizations solving HJB equations suffer the curse of dimensionality malediction: for an error of ǫ , the computing time of finite difference or finite element methods is at least in the order of (1 /ǫ ) d / 2 . Some possible curse of dimensionality-free methods: ◮ Idempotent methods introduced by McEneaney (2007) in the deterministic case, and by McEneaney, Kaise and Han (2011) in the stochastic case. ◮ Probabilistic numerical methods based on a backward stochastic differential equation interpretation of the HJB equation, simulations and regressions: ◮ Quantization Bally, Pag` es (2003) for stopping time problems. ◮ Introduction of a new process without control: Bouchard, Touzi (2004) when σ does not depend on control; Cheridito, Soner, Touzi and Victoir (2007) and Fahim, Touzi and Warin (2011) in the fully-nonlinear case. ◮ Control randomization: Kharroubi, Langren´ e, Pham (2013). ◮ Fixed point iterations: Bender, Zhang (2008) for semilinear PDE (which are not HJB equations).
The idempotent method of McEneaney, Kaise and Han ξ m , u the Euler discretization of the process ξ Given m and u , denote by ˆ with time step h : ξ m , u ( t + h ) = ˆ ˆ ξ m , u ( t ) + f m (ˆ ξ m , u ( t ) , u ) h + σ m (ˆ ξ m , u ( t ) , u )( W t + h − W t ) . Define the dynamic programming operators: � �� h ℓ m ( x , u ) + e − h δ m ( x , u ) E φ (ˆ ξ m , u ( t + h )) | ˆ T m ξ m , u ( t ) = x � t , h ( φ )( x ) =sup , u ∈U and m ∈M T m T t , h ( φ )( x ) =max t , h ( φ )( x ) . The HJB equation can be discretized in time by: v h ( t , x ) = T t , h ( v h ( t + h , · ))( x ) , t ∈ T h := { 0 , h , 2 h , . . . , T − h } . Under appropriate assumptions, this scheme converges to the solution of HJB eq. when h goes to zero.
◮ In the deterministic case ( σ m = 0), T m t , h and T t , h are max-plus linear: v h ( t + h , x ) = max i =1 ,..., N ( λ i + q t + h ( x )) ∀ x ⇒ i i = T t , h ( q t + h v h ( t , x ) = max i =1 ,..., N ( λ i + q t i ( x )) ∀ x with q t ) . i ◮ We only need to compute the effect of the dynamic programming operator on the finite basis q T i , i = 1 , . . . , N , for instance by computing their projection on a fixed basis (see Fleming and McEneaney (2000) and A.,Gaubert,Lakoua (2008)). ◮ However, the q T are difficult to compute in general, or the size of the i basis need to be exponential in d . ◮ If T m t , h ( q ) is a quadratic form when q is a quadratic form, and if it easy to compute (for instance when the H m correspond to linear quadratic problems), and if the q T are quadratic forms, then the q t i are finite i suppremum of quadratic forms easy to compute (see McEneaney (2006)). The number of quadratic forms for v h (0 , · ) is exponential in the number of time step only. ◮ This idea was extended to the stochastic case by McEneaney, Kaise and Han (2011).
Theorem (McEneaney, Kaise and Han (2011)) Assume δ m = 0 , σ m is constant, f m is affine, ℓ m is concave quadratic (with respect to ( x , u ) ), and ψ is the supremum of a finite number of concave quadratic forms. Then, for all t ∈ T h , there exists a set Z t and a map g t : R d × Z t → R such that for all z ∈ Z t , g t ( · , z ) is a concave quadratic form and v h ( t , x ) = sup g t ( x , z ) . z ∈ Z t Moreover, the sets Z t satisfy Z t = M × { ¯ z t + h : W → Z t + h | Borel measurable } , where W = R d is the space of values of the Brownian process. ◮ Here a concave quadratic form is any map R d → R of the form x �→ q ( x , z ) := 1 2 x T Qx + b · x + c , with z = ( Q , b , c ) ∈ Q d = S − d × R d × R ◮ The proof uses the max-plus (infinite) distributivity property.
◮ In the deterministic case, the sets Z t are finite, and their cardinality is exponential in time: # Z t = M × # Z t + h = · · · = M N t × # Z T with M = # M and N t = ( T − t ) / h . ◮ In the stochastic case, the sets Z t are infinite as soon as t < T . ◮ If the Brownian process is discretized in space, then W can be replaced by the finite subset with fixed cardinality p , and the sets Z t become finite. ◮ Nevertheless, their cardinality increases doubly exponentially in time: pNt − 1 p − 1 × (# Z T ) p Nt where p ≥ 2 # Z t = M × (# Z t + h ) p = · · · = M ( p = 2 for the Bernouilli discretization). ◮ Then, McEneaney, Kaise and Han proposed to apply a pruning method to reduce at each time step t ∈ T h the cardinality of Z t . ◮ In this talk, we shall replace pruning by random sampling. ◮ The idea is to use only quadratic forms that are optimal in the points of a sample of the process.
Consider the case with no continuous control u and no discount factor. ◮ Then ˆ t , h (ˆ ξ m ( t + h ) = S m ξ m ( t ) , W t + h − W t ) with S m t , h ( x , w ) = x + f m ( x ) h + σ m ( x ) w . and T m t , h ( φ )( x ) = h ℓ m ( x ) + E � φ ( S m � t , h ( x , W t + h − W t )) . d × R d × R , ◮ Assume that φ ( x ) = max z ∈ Z t + h q ( x , z ), Z t + h ⊂ Q d = S − 2 x T Qx + b · x + c , z = ( Q , b , c ) ∈ Q d . and q ( x , z ) := 1 ◮ Then, for each x ∈ R d , there exists ¯ z m x : W → Z t + h measurable s.t. φ ( S m � S m z m � t , h ( x , W t + h − W t )) = q t , h ( x , W t + h − W t ) , ¯ x ( W t + h − W t ) . ◮ Moreover, under the previous assumptions on ℓ m , f m and σ m , we have, for all x ′ ∈ R d , h ℓ m ( x ′ ) + q S m t , h ( x ′ , W t + h − W t ) , ¯ z m = q ( x ′ , z m � � �� E x ( W t + h − W t ) x ) for some z m x ∈ Q d , and so T m t , h ( φ )( x ) = q ( x , z m x ′ ∈ R d q ( x , z m x ) = sup x ′ ) .
The sampling algorithm ◮ Let M = # M and choose N = ( N in , N rg ) giving size of samples. ◮ Choose Z T ⊂ Q d such that | ψ ( x ) − max z ∈ Z T q ( x , z ) | ≤ ǫ . Define v h , N ( T , x ) = max z ∈ Z T q ( x , z ), for x ∈ R d . ◮ Construct a sample of ((ˆ ξ m (0)) m ∈M , ( W t + h − W t ) t ∈T h ) of size N in indexed by ω ∈ Ω N in := { 1 , . . . , N in } , and deduce ˆ ξ m ( t , ω ), m ∈ M . ◮ For t = T − h , T − 2 h , . . . , 0 do: 1. For each ω ∈ Ω N in and m ∈ M , denote x = ˆ ξ m ( t , ω ), and construct a i ) of (Ω N in ) 2 , i ∈ Ω N rg . Let subsample of size N rg of elements ( ω i , ω ′ z m ¯ x : W → Z t + h (as above), be computed at the points ( W t + h − W t )( ω ′ i ) only. Consider q ( x ′ , w ) = h ℓ m ( x ′ ) + q � S m z m � ˜ t , h ( x ′ , w ) , ¯ x ( w ) . Approximate z m x such that q ( x ′ , z m x ) = E [˜ q ( x ′ , W t + h − W t )] by doing a q (ˆ ξ m ( t ) , W t + h − W t ) using a (usual) basis of quadratic regression of ˜ forms of ˆ ξ m ( t ), and the sample (ˆ ξ m ( t , ω i ) , ( W t + h − W t )( ω ′ i )), i ∈ Ω N rg . 2. Let Z t be the set of the parameters z m x ∈ Q d of all the quadratic forms obtained in Step 2. Define v h , N ( t , x ) = max z ∈ Z t q ( x , z ).
Recommend
More recommend