Solving Hamilton-Jacobi-Bellman equations by combining a max-plus linear approximation and a probabilistic numerical method
Marianne Akian
INRIA Saclay - ˆ Ile-de-France and CMAP ´ Ecole polytechnique CNRS
Solving Hamilton-Jacobi-Bellman equations by combining a max-plus - - PowerPoint PPT Presentation
Solving Hamilton-Jacobi-Bellman equations by combining a max-plus linear approximation and a probabilistic numerical method Marianne Akian INRIA Saclay - Ile-de-France and CMAP Ecole polytechnique CNRS RICAM Workshop: Numerical methods
INRIA Saclay - ˆ Ile-de-France and CMAP ´ Ecole polytechnique CNRS
t e− s
t δµτ (ξτ,uτ)dτℓµs(ξs, us)ds
T
t
δµτ (ξτ,uτ)dτψ(ξT) | ξt = x
µ,u J(t, x, µ, u) .
∂t − H(x, v(t, x), Dv(t, x), D2v(t, x)) = 0,
m∈M Hm(x, r, p, Γ) ,
2 tr
◮ Idempotent methods introduced by McEneaney (2007) in the
◮ Probabilistic numerical methods based on a backward stochastic
◮ Quantization Bally, Pag`
◮ Introduction of a new process without control: Bouchard, Touzi (2004)
◮ Control randomization: Kharroubi, Langren´
◮ Fixed point iterations: Bender, Zhang (2008) for semilinear PDE (which
t,h(φ)(x) =sup u∈U
m∈M T m t,h(φ)(x) .
◮ In the deterministic case (σm = 0), T m t,h and Tt,h are max-plus linear:
i
i (x)) ∀x with qt i = Tt,h(qt+h i
◮ We only need to compute the effect of the dynamic programming
i , i = 1, . . . , N, for instance by
◮ However, the qT i
◮ If T m t,h(q) is a quadratic form when q is a quadratic form, and if it easy
i
i are finite
◮ This idea was extended to the stochastic case by McEneaney, Kaise
z∈Zt
◮ Here a concave quadratic form is any map Rd → R of the form
d ×Rd×R ◮ The proof uses the max-plus (infinite) distributivity property.
◮ In the deterministic case, the sets Zt are finite, and their cardinality is
◮ In the stochastic case, the sets Zt are infinite as soon as t < T. ◮ If the Brownian process is discretized in space, then W can be replaced
◮ Nevertheless, their cardinality increases doubly exponentially in time:
pNt −1 p−1 × (#ZT)pNt where p ≥ 2
◮ Then, McEneaney, Kaise and Han proposed to apply a pruning method
◮ In this talk, we shall replace pruning by random sampling. ◮ The idea is to use only quadratic forms that are optimal in the points
◮ Then ˆ
t,h(ˆ
t,h(x, w) = x + f m(x)h + σm(x)w .
t,h(φ)(x) = hℓm(x) + E
t,h(x, Wt+h − Wt))
◮ Assume that φ(x) = maxz∈Zt+h q(x, z), Zt+h ⊂ Qd = S− d × Rd × R,
2xT Qx + b · x + c, z = (Q, b, c) ∈ Qd. ◮ Then, for each x ∈ Rd, there exists ¯
x : W → Zt+h measurable s.t.
t,h(x, Wt+h − Wt)) = q
t,h(x, Wt+h − Wt), ¯
x (Wt+h − Wt)
◮ Moreover, under the previous assumptions on ℓm, f m and σm, we have,
t,h(x′, Wt+h − Wt), ¯
x (Wt+h − Wt)
x )
x ∈ Qd, and so
t,h(φ)(x) = q(x, zm x ) = sup x′∈Rd q(x, zm x′) .
◮ Let M = #M and choose N = (Nin, Nrg) giving size of samples. ◮ Choose ZT ⊂ Qd such that |ψ(x) − maxz∈ZT q(x, z)| ≤ ǫ. Define
◮ Construct a sample of ((ˆ
◮ For t = T − h, T − 2h, . . . , 0 do:
i) of (ΩNin)2, i ∈ ΩNrg. Let
x : W → Zt+h (as above), be computed at the points
i) only.
t,h(x′, w), ¯
x (w)
x such that q(x′, zm x ) = E[˜
i)), i ∈ ΩNrg.
x ∈ Qd of all the quadratic forms
◮ Several subsamplings of elements (ωi, ω′ i), i ∈ ΩNrg, of (ΩNin)2 have
i = i.
i are
in.
◮ For each time step t ∈ Th, we have #Zt ≤ M × Nin, and the number
◮ d = 2, M = {ρmin, ρmax} with −1 ≤ ρmin ≤ ρmax ≤ 1 . ◮ The dynamics of the processes are given, for all m ∈ M, by f m = 0
◮ The final reward is given by
◮ We restrict the state space to the set of ξ ∈ R2 + s.t.
◮ We take σ1 = 0.4, σ2 = 0.3, K1 = −5, K2 = 5, T = 0.25, and fix the
◮ M = {ρ} with ρ = −0.8 or ρ = 0.8, or M = {−0.8, 0.8}. ◮ When M = {ρ}, the value function is known, so we can compute the
◮ We tested in that case each subsampling method: Method 1 (initial
◮ Discretize the continuous control u and apply the previous method
◮ So we need to consider simulations associated to a process
◮ The probabilistic method of Fahim, Touzi and Warin (2011) uses the
◮ We shall use the simulation of one process of each discrete control m.
t
t ) is a
t , Yt, Zt, Γt).
t :
t,h(vm,h(t + h, ·))(x),
t,h(φ)(x) = D0 m,t,h(φ)(x)+hGm(x, D0 m,t,h(φ)(x), D1 m,t,h(φ)(x), D2 m,t,h(φ)(x))
m,t,h(φ) being the approximation of the ith derivative of φ given by
m,t,h(φ)(x) = E(φ( ˆ
m,t,x,h(Wt+h − Wt) | ˆ
m,t,x,h is a polynomial of degree i with values in
m,t,x,h = 1,
m,t,x,h(w) = w
m,t,x,h(w) = wwT − hI
◮ The time approximation of (Fahim, Touzi and Warin) works when
t,h is a monotone operator over the set of Lipschitz
t,h(φ) ≤ T m t,h(ψ) . ◮ T m t,h is further approximated by using a regression scheme. ◮ Under some technical assumptions, the above algorithm converges. ◮ The above assumptions do not allow in general to handle the case of
◮ Note also that theoretically, the sample size to obtain the convergence
◮ When the above assumptions are satisfied for the Hm, one consider
m∈M T m t,h(φ)(x) ,
t,h as above, for each m ∈ M. ◮ The mixed scheme consists also in the approximation of the above
m,t,h(φ), when φ is the
◮ To do this, we need a result comparable to the one of McEneaney,
◮ We know that Tt,h is monotone.
¯ z∈Z
z)
z : W → R, W → φ(W , ¯
z ∈ D}.
t,h preserve random quadratic forms, the conclusion
◮ However, the assumptions necessary for the scheme of (Fahim, Touzi
◮ Recently, we managed to get rid of these assumptions by using
◮ We hope to use this new scheme to obtain more appealing nunmerical
◮ We proposed an algorithm to solve HJB equations, combining ideas
◮ The advantages with respect to the pure probabilistic scheme are that
◮ The advantages with respect to the pure idempotent scheme is that
◮ The theoretical results suggest that it can also be applied to Isaacs
◮ We find only recently an improvement of the probabilistic scheme of