Solving Hamilton-Jacobi-Bellman equations by combining a max-plus - - PowerPoint PPT Presentation

solving hamilton jacobi bellman equations by combining a
SMART_READER_LITE
LIVE PREVIEW

Solving Hamilton-Jacobi-Bellman equations by combining a max-plus - - PowerPoint PPT Presentation

Solving Hamilton-Jacobi-Bellman equations by combining a max-plus linear approximation and a probabilistic numerical method Marianne Akian INRIA Saclay - Ile-de-France and CMAP Ecole polytechnique CNRS RICAM Workshop: Numerical methods


slide-1
SLIDE 1

Solving Hamilton-Jacobi-Bellman equations by combining a max-plus linear approximation and a probabilistic numerical method

Marianne Akian

INRIA Saclay - ˆ Ile-de-France and CMAP ´ Ecole polytechnique CNRS

RICAM Workshop: Numerical methods for Hamilton-Jacobi equations in optimal control and related fields Linz, November 21-25, 2016 Joint work with Eric Fodjo, see arXiv:1605.02816

slide-2
SLIDE 2

A finite horizon diffusion control problem involving “discrete” and “continuum” controls

The state ξs ∈ Rd satisfies the stochastic differential equation dξs = f µs(ξs, us)ds + σµs(ξs, us)dWs , where (Ws)s≥0 is a d-dimensional Brownian motion, µ := (µs)0≤s≤T, and u := (us)0≤s≤T are admissible control processes, µs ∈ M a finite set and us ∈ U ⊂ Rp. The problem consists in maximizing the finite horizon discounted payoff (δm ≥ 0): J(t, x, µ, u) := E T

t e− s

t δµτ (ξτ,uτ)dτℓµs(ξs, us)ds

+e−

T

t

δµτ (ξτ,uτ)dτψ(ξT) | ξt = x

  • .
slide-3
SLIDE 3

The Hamilton-Jacobi-Bellman (HJB) equation

Define the value function v : [0, T] × Rd → R as: v(t, x) = sup

µ,u J(t, x, µ, u) .

Under suitable assumptions, it is the unique (continuous) viscosity solution

  • f the HJB equation

− ∂v

∂t − H(x, v(t, x), Dv(t, x), D2v(t, x)) = 0,

x ∈ Rd, t ∈ [0, T), v(T, x) = ψ(x), x ∈ Rd, satisfying also some growth condition at infinity (in space), where the Hamiltonian H : Rd × R × Rd × Sd → R is given by: H(x, r, p, Γ) := max

m∈M Hm(x, r, p, Γ) ,

with Hm(x, r, p, Γ) := maxu∈U

  • 1

2 tr

  • σm(x, u)σm(x, u)T Γ
  • +f m(x, u) · p − δm(x, u)r + ℓm(x, u)
  • .
slide-4
SLIDE 4

Standard grid based discretizations solving HJB equations suffer the curse

  • f dimensionality malediction: for an error of ǫ, the computing time of finite

difference or finite element methods is at least in the order of (1/ǫ)d/2. Some possible curse of dimensionality-free methods:

◮ Idempotent methods introduced by McEneaney (2007) in the

deterministic case, and by McEneaney, Kaise and Han (2011) in the stochastic case.

◮ Probabilistic numerical methods based on a backward stochastic

differential equation interpretation of the HJB equation, simulations and regressions:

◮ Quantization Bally, Pag`

es (2003) for stopping time problems.

◮ Introduction of a new process without control: Bouchard, Touzi (2004)

when σ does not depend on control; Cheridito, Soner, Touzi and Victoir (2007) and Fahim, Touzi and Warin (2011) in the fully-nonlinear case.

◮ Control randomization: Kharroubi, Langren´

e, Pham (2013).

◮ Fixed point iterations: Bender, Zhang (2008) for semilinear PDE (which

are not HJB equations).

slide-5
SLIDE 5

The idempotent method of McEneaney, Kaise and Han

Given m and u, denote by ˆ ξm,u the Euler discretization of the process ξ with time step h: ˆ ξm,u(t + h) = ˆ ξm,u(t) + f m(ˆ ξm,u(t), u)h + σm(ˆ ξm,u(t), u)(Wt+h − Wt) . Define the dynamic programming operators: T m

t,h(φ)(x) =sup u∈U

  • hℓm(x, u) + e−hδm(x,u)E
  • φ(ˆ

ξm,u(t + h)) | ˆ ξm,u(t) = x

  • ,

and Tt,h(φ)(x) =max

m∈M T m t,h(φ)(x) .

The HJB equation can be discretized in time by: vh(t, x) = Tt,h(vh(t + h, ·))(x), t ∈ Th := {0, h, 2h, . . . , T − h} . Under appropriate assumptions, this scheme converges to the solution of HJB eq. when h goes to zero.

slide-6
SLIDE 6

◮ In the deterministic case (σm = 0), T m t,h and Tt,h are max-plus linear:

vh(t + h, x) = maxi=1,...,N(λi + qt+h

i

(x)) ∀x ⇒ vh(t, x) = maxi=1,...,N(λi + qt

i (x)) ∀x with qt i = Tt,h(qt+h i

) .

◮ We only need to compute the effect of the dynamic programming

  • perator on the finite basis qT

i , i = 1, . . . , N, for instance by

computing their projection on a fixed basis (see Fleming and McEneaney (2000) and A.,Gaubert,Lakoua (2008)).

◮ However, the qT i

are difficult to compute in general, or the size of the basis need to be exponential in d.

◮ If T m t,h(q) is a quadratic form when q is a quadratic form, and if it easy

to compute (for instance when the Hm correspond to linear quadratic problems), and if the qT

i

are quadratic forms, then the qt

i are finite

suppremum of quadratic forms easy to compute (see McEneaney (2006)). The number of quadratic forms for vh(0, ·) is exponential in the number of time step only.

◮ This idea was extended to the stochastic case by McEneaney, Kaise

and Han (2011).

slide-7
SLIDE 7

Theorem (McEneaney, Kaise and Han (2011))

Assume δm = 0, σm is constant, f m is affine, ℓm is concave quadratic (with respect to (x, u)), and ψ is the supremum of a finite number of concave quadratic forms. Then, for all t ∈ Th, there exists a set Zt and a map gt : Rd × Zt → R such that for all z ∈ Zt, gt(·, z) is a concave quadratic form and vh(t, x) = sup

z∈Zt

gt(x, z) . Moreover, the sets Zt satisfy Zt = M × {¯ zt+h : W → Zt+h | Borel measurable} , where W = Rd is the space of values of the Brownian process.

◮ Here a concave quadratic form is any map Rd → R of the form

x → q(x, z) := 1 2xT Qx+b·x+c, with z = (Q, b, c) ∈ Qd = S−

d ×Rd×R ◮ The proof uses the max-plus (infinite) distributivity property.

slide-8
SLIDE 8

◮ In the deterministic case, the sets Zt are finite, and their cardinality is

exponential in time: #Zt = M × #Zt+h = · · · = MNt × #ZT with M = #M and Nt = (T − t)/h.

◮ In the stochastic case, the sets Zt are infinite as soon as t < T. ◮ If the Brownian process is discretized in space, then W can be replaced

by the finite subset with fixed cardinality p, and the sets Zt become finite.

◮ Nevertheless, their cardinality increases doubly exponentially in time:

#Zt = M × (#Zt+h)p = · · · = M

pNt −1 p−1 × (#ZT)pNt where p ≥ 2

(p = 2 for the Bernouilli discretization).

◮ Then, McEneaney, Kaise and Han proposed to apply a pruning method

to reduce at each time step t ∈ Th the cardinality of Zt.

◮ In this talk, we shall replace pruning by random sampling. ◮ The idea is to use only quadratic forms that are optimal in the points

  • f a sample of the process.
slide-9
SLIDE 9

Consider the case with no continuous control u and no discount factor.

◮ Then ˆ

ξm(t + h) = Sm

t,h(ˆ

ξm(t), Wt+h − Wt) with Sm

t,h(x, w) = x + f m(x)h + σm(x)w .

and T m

t,h(φ)(x) = hℓm(x) + E

  • φ(Sm

t,h(x, Wt+h − Wt))

  • .

◮ Assume that φ(x) = maxz∈Zt+h q(x, z), Zt+h ⊂ Qd = S− d × Rd × R,

and q(x, z) := 1

2xT Qx + b · x + c, z = (Q, b, c) ∈ Qd. ◮ Then, for each x ∈ Rd, there exists ¯

zm

x : W → Zt+h measurable s.t.

φ(Sm

t,h(x, Wt+h − Wt)) = q

  • Sm

t,h(x, Wt+h − Wt), ¯

zm

x (Wt+h − Wt)

  • .

◮ Moreover, under the previous assumptions on ℓm, f m and σm, we have,

for all x′ ∈ Rd, E

  • hℓm(x′) + q
  • Sm

t,h(x′, Wt+h − Wt), ¯

zm

x (Wt+h − Wt)

  • = q(x′, zm

x )

for some zm

x ∈ Qd, and so

T m

t,h(φ)(x) = q(x, zm x ) = sup x′∈Rd q(x, zm x′) .

slide-10
SLIDE 10

The sampling algorithm

◮ Let M = #M and choose N = (Nin, Nrg) giving size of samples. ◮ Choose ZT ⊂ Qd such that |ψ(x) − maxz∈ZT q(x, z)| ≤ ǫ. Define

vh,N(T, x) = maxz∈ZT q(x, z), for x ∈ Rd.

◮ Construct a sample of ((ˆ

ξm(0))m∈M, (Wt+h − Wt)t∈Th) of size Nin indexed by ω ∈ ΩNin := {1, . . . , Nin}, and deduce ˆ ξm(t, ω), m ∈ M.

◮ For t = T − h, T − 2h, . . . , 0 do:

  • 1. For each ω ∈ ΩNin and m ∈ M, denote x = ˆ

ξm(t, ω), and construct a subsample of size Nrg of elements (ωi, ω′

i) of (ΩNin)2, i ∈ ΩNrg. Let

¯ zm

x : W → Zt+h (as above), be computed at the points

(Wt+h − Wt)(ω′

i) only.

Consider ˜ q(x′, w) = hℓm(x′) + q

  • Sm

t,h(x′, w), ¯

zm

x (w)

  • .

Approximate zm

x such that q(x′, zm x ) = E[˜

q(x′, Wt+h − Wt)] by doing a regression of ˜ q(ˆ ξm(t), Wt+h − Wt) using a (usual) basis of quadratic forms of ˆ ξm(t), and the sample (ˆ ξm(t, ωi), (Wt+h − Wt)(ω′

i)), i ∈ ΩNrg.

  • 2. Let Zt be the set of the parameters zm

x ∈ Qd of all the quadratic forms

  • btained in Step 2. Define v h,N(t, x) = maxz∈Zt q(x, z).
slide-11
SLIDE 11

The sampling algorithm

◮ Several subsamplings of elements (ωi, ω′ i), i ∈ ΩNrg, of (ΩNin)2 have

been tested:

  • 1. the initial sampling: Nrg = Nin and ωi = ω′

i = i.

  • 2. Nrg = Nx × Nw, choose, once for all ω ∈ ΩNin and m ∈ M, a random

sampling of size Nx of the elements of ΩNin and an independent random sampling of size Nw and make the product: so ωi and ω′

i are

independent.

  • 3. Do as in Method 2, but choose different samplings for each ω ∈ ΩNin

and m ∈ M, independently.

  • 4. Do as in Method 2, but replace the second random sampling of ΩNin by

ΩNin itself, so Nw = Nin.

  • 5. Do as in Method 2, but replace both random samplings of ΩNin by ΩNin

itself, so Nrg = N2

in.

◮ For each time step t ∈ Th, we have #Zt ≤ M × Nin, and the number

  • f computations is in O((M × Nin)2 × Nrg) and in

O((M × Nin)2 × Nw) for subsamplings 2,3,4.

slide-12
SLIDE 12

We consider a simple problem of pricing and hedging an option with uncertain volatility and two underlying processes, tested in Kharroubi, Langren´ e, Pham (2013).

◮ d = 2, M = {ρmin, ρmax} with −1 ≤ ρmin ≤ ρmax ≤ 1 . ◮ The dynamics of the processes are given, for all m ∈ M, by f m = 0

and σm(ξ) =

  • σ1ξ1

σ2mξ2 σ2 √ 1 − m2ξ2

  • ,

ξ = (ξ1, ξ2) ∈ R2 , with σ1, σ2 > 0, that is dξi = σiξidBi, dB1, dB2 = µsds .

◮ The final reward is given by

ψ(ξ) = (ξ1 − ξ2 − K1)+ − (ξ1 − ξ2 − K2)+, ξ = (ξ1, ξ2) ∈ R2 , with x+ = max(x, 0), K1 < K2.

slide-13
SLIDE 13

◮ We restrict the state space to the set of ξ ∈ R2 + s.t.

ξ1 − ξ2 ∈ [−100, 100].

◮ We take σ1 = 0.4, σ2 = 0.3, K1 = −5, K2 = 5, T = 0.25, and fix the

time discretization step to h = 0.01.

◮ M = {ρ} with ρ = −0.8 or ρ = 0.8, or M = {−0.8, 0.8}. ◮ When M = {ρ}, the value function is known, so we can compute the

error.

◮ We tested in that case each subsampling method: Method 1 (initial

sampling) gives very bad results even at time T − h. Method 5 need too much space and time even for Nin = 1000. Method 2 seems to be the best one.

slide-14
SLIDE 14

ρ Nin Nrg Nx Nw Nm e∞ e1

  • 0.8

1000 10000 10 1000 2 0.521 0.173 0.8 1000 10000 10 1000 2 0.157 0.074

  • 0.8

1000 1000 10 100 2 0.75 0.41 0.8 1000 1000 10 100 2 0.36 0.11

  • 0.8

1000 1000 10 100 3 3.48 1.92 0.8 1000 1000 10 100 3 3.05 0.81

  • 0.8

100 1000 10 100 2 1.95 0.46 0.8 100 1000 10 100 2 1.81 0.33

  • 0.8

100 10000 10 1000 2 2.09 0.53 0.8 100 10000 10 1000 2 1.79 0.36

  • 0.8

100 1000 10 100 4 2.15 0.55 0.8 100 1000 10 100 4 1.80 0.39

Table : Sup-norm and normalized ℓ1 norm of the error on the value function at time t = 0, and states ξ2 = 50 and ξ1 ∈ [20, 80], denoted e∞ and e1 resp., when M = ρ, and the subsampling number Nm is used.

slide-15
SLIDE 15

Figure : Value function obtained at t = 0, and ξ2 = 50 as a function of ξ1 ∈ [20, 80]. Here Nin = 1000, Nrg = Nx × Nw, Nx = 10, Nw = 1000 and the subsampling method 2 is used. In blue, M = {−0.8}, in green M = {0.8}, and in black M = {−0.8, 0.8}.

slide-16
SLIDE 16

In the general case:

◮ Discretize the continuous control u and apply the previous method

would be too expensive since one need to simulate a process for each m and u.

◮ So we need to consider simulations associated to a process

independent of the control u at least.

◮ The probabilistic method of Fahim, Touzi and Warin (2011) uses the

simulation of one process only.

◮ We shall use the simulation of one process of each discrete control m.

slide-17
SLIDE 17

The algorithm of Fahim, Touzi and Warin (2011)

For a fixed m ∈ M, consider the equation: −∂v ∂t − Hm(x, v(t, x), Dv(t, x), D2v(t, x)) = 0, x ∈ Rd, t ∈ [0, T) . Decompose the Hamiltonian Hm as Hm = Lm + Gm with Lm(x, r, p, Γ) := 1 2 tr (am(x)Γ) + f m(x) · p , am(x) = σm(x)σm(x)T and Gm such that ∂ΓGm is positive semidefinite (that is Gm is elliptic). The idea of the algorithm: if X m

t

is the diffusion with generator Lm, and vm the viscosity solution of the above equation, then Yt = vm(X m

t ) is a

backward process with drift −Gm(X m

t , Yt, Zt, Γt).

slide-18
SLIDE 18

Denote by ˆ X m the Euler discretization of X m

t :

ˆ X m(t + h) = ˆ X m(t) + f m( ˆ X m(t))h + σm( ˆ X m(t))(Wt+h − Wt) . Then, vm is approximated (Fahim, Touzi and Warin) by vm,h satisfying: vm,h(t, x) = T m

t,h(vm,h(t + h, ·))(x),

t ∈ Th , and T m

t,h(φ)(x) = D0 m,t,h(φ)(x)+hGm(x, D0 m,t,h(φ)(x), D1 m,t,h(φ)(x), D2 m,t,h(φ)(x))

with Di

m,t,h(φ) being the approximation of the ith derivative of φ given by

Di

m,t,h(φ)(x) = E(φ( ˆ

X m(t + h))Pi

m,t,x,h(Wt+h − Wt) | ˆ

X m(t) = x) , where, for all m, t, x, h, i, Pi

m,t,x,h is a polynomial of degree i with values in

an appropriate finite dimensional space. For instance, when σm is the identity: P0

m,t,x,h = 1,

P1

m,t,x,h(w) = w

h , P2

m,t,x,h(w) = wwT − hI

h2 .

slide-19
SLIDE 19

◮ The time approximation of (Fahim, Touzi and Warin) works when

tr(am(x)−1∂ΓGm) ≤ 1 and Gm is Lipschitz continuous. Indeed, this implies that T m

t,h is a monotone operator over the set of Lipschitz

continuous functions Rd → R: φ ≤ ψ = ⇒ T m

t,h(φ) ≤ T m t,h(ψ) . ◮ T m t,h is further approximated by using a regression scheme. ◮ Under some technical assumptions, the above algorithm converges. ◮ The above assumptions do not allow in general to handle the case of

the Hamiltonian H directly, since it may be nonsmooth and with noncomparable diffusion coefficients.

◮ Note also that theoretically, the sample size to obtain the convergence

  • f the estimator is at least in the order of 1/hd/2. Also the dimension
  • f the linear regression space should be in this order. The curse of

dimensionality persists, but in practice a much smaller sample size is sufficient.

slide-20
SLIDE 20

Combining idempotent and probabilistic algorithms

◮ When the above assumptions are satisfied for the Hm, one consider

the following mixed scheme: vh(t, x) = Tt,h(vh(t + h, ·))(x), t ∈ Th , with Tt,h(φ)(x) =max

m∈M T m t,h(φ)(x) ,

and T m

t,h as above, for each m ∈ M. ◮ The mixed scheme consists also in the approximation of the above

vh(t, ·) as a finite supremum of quadratic forms, using a sampling algorithm similar to the previous one, where we apply a regression over the set of quadratic forms to compute Di

m,t,h(φ), when φ is the

supremum of quadratic forms.

◮ To do this, we need a result comparable to the one of McEneaney,

Kaise and Han (2011) with the new operator Tt,h.

◮ We know that Tt,h is monotone.

slide-21
SLIDE 21

The following result generalizes the max-plus distributivity. It says that any monotone continuous map distributes over max.

Theorem

Let W = Rd and D be the set of measurable functions W → R with at most exponential growth rate, and and let G be a monotone additively α-subhomogeneous operator from D → R, for some constant α > 0. Let (Z, A) be a measurable space, and let W be endowed with its Borel σ-algebra. Let φ : W × Z → R be a measurable map such that for all z ∈ Z, φ(·, z) is continuous and belongs to D. Let v ∈ D be such that v(W ) = supz∈Z φ(W , z). Assume that v is continuous. Then, G(v) = sup

¯ z∈Z

G(¯ φ¯

z)

where ¯ φ¯

z : W → R, W → φ(W , ¯

z(W )), and Z = {¯ z : W → Z, measurable and such that ¯ φ¯

z ∈ D}.

slide-22
SLIDE 22

Then, if the operators T m

t,h preserve random quadratic forms, the conclusion

  • f the theorem of McEneaney, Kaise and Han (2011) holds for the mixed

scheme. And a sampling algorithm can be applied to compute the sets Zt such that vh(t, x) = maxz∈Zt q(x, z). Moreover, the previous result and method can also be applied to HJB-Isaacs equations of zero sum games, as soon as the above property holds.

slide-23
SLIDE 23

◮ However, the assumptions necessary for the scheme of (Fahim, Touzi

and Warin) to work may be difficult to obtain for the Hamiltonians Hm too: linear quadratic problems have unbounded coefficients, and in financial applications, the diffusions coefficients are highly unnoncomparable.

◮ Recently, we managed to get rid of these assumptions by using

approximations of the first and second derivative of the function vh by using different polynomials in (Wt+h − Wt)+ and (Wt+h − Wt)− and Wt+h − Wt.

◮ We hope to use this new scheme to obtain more appealing nunmerical

results in the future.

slide-24
SLIDE 24

Conclusion

◮ We proposed an algorithm to solve HJB equations, combining ideas

included in the idempotent algorithm of McEneaney, Kaise and Han (2011) and in the probabilistic numerical scheme of Fahim, Touzi and Warin (2011).

◮ The advantages with respect to the pure probabilistic scheme are that

the regression estimation is over a linear space of small dimension, and that one may bypass in some cases the restrictive condition of application of the method.

◮ The advantages with respect to the pure idempotent scheme is that

  • ne may avoid the pruning step: the number of quadratic forms

generated by the algorithm is linear with respect to the sampling size times the number of discrete controls.

◮ The theoretical results suggest that it can also be applied to Isaacs

equations of zero-sum games.

◮ We find only recently an improvement of the probabilistic scheme of

Fahim, Touzi and Warin (2011), which will allows us to do numerical tests in high dimension and with fully nonlinear Hamiltonians.