Time Inconsistent Optimal Control and Mean Variance Optimization - - PDF document

time inconsistent optimal control and mean variance
SMART_READER_LITE
LIVE PREVIEW

Time Inconsistent Optimal Control and Mean Variance Optimization - - PDF document

Time Inconsistent Optimal Control and Mean Variance Optimization Tomas Bj ork Stockholm School of Economics Agatha Murgoci Copenhagen Business School Xunyu Zhou Oxford University Conference in honour of Walter Schachermayer Wien 2010


slide-1
SLIDE 1

Time Inconsistent Optimal Control and Mean Variance Optimization

Tomas Bj¨

  • rk

Stockholm School of Economics Agatha Murgoci Copenhagen Business School Xunyu Zhou Oxford University Conference in honour of Walter Schachermayer Wien 2010

– Typeset by FoilT EX – 1

slide-2
SLIDE 2

Contents

  • Recap of DynP.
  • Problem formulation.
  • Discrete time.
  • Continuous time.
  • Example: Dynamic mean-variance optimization.

– Typeset by FoilT EX – 2

slide-3
SLIDE 3

Standard problem

We are standing at time t = 0 in state X0 = x0. max

u

E T h(s, Xs, us)dt + F(XT)

  • dXt = µ(t, Xt, ut)dt + σ(t, Xt, ut)dWt

For simplicty we assume that

  • X is scalar.
  • The adapted control ut is scalar with no restrictions.

We denote this problem by P We restrict ourselves to feedback controls of the form ut = u(t, Xt).

– Typeset by FoilT EX – 3

slide-4
SLIDE 4

Dynamic Programming

We embed the problem P in a family of problems Ptx Ptx : max

u

Et,x T

t

h(s, Xs, us)dt + F(XT)

  • dXs

= µ(t, Xs, us)ds + σ(s, Xs, us)dWs, Xt = x The original problem corresponds to P0,x0.

– Typeset by FoilT EX – 4

slide-5
SLIDE 5

Bellman

We now have the Bellman optimality principle, which says that the family {Pt,x; t ≥ 0, x ∈ R} are time consistent. More precisely: If ˆ u is optimal on the time interval [t, T], then it is also

  • ptimal on the sub-interval [s, T] for every s with t ≤ s ≤ T.

We can easily derive the Hamilton-Jacobi-Bellman equation HJB: Vt(t, x) + sup

u

  • h(t, x, u) + µ(t, x, u)Vx(t, x) + 1

2σ2(t, x, u)Vxx(t, x)

  • =

0, V (T, x) = F(x)

– Typeset by FoilT EX – 5

slide-6
SLIDE 6

Three Disturbing Examples

Hyperbolic discounting (Ekeland-Lazrak-Pirvu) max

u

Et,x T

t

ϕ(s − t)h(cs)ds + ϕ(T − t)F(XT)

  • Mean variance utility (Basak-Chabakauri)

max

u

Et,x [XT] − γ 2V art,x (XT) Endogenous habit formation max

u

Et,x

  • ln

XT x − β

  • dXt = [rXt + (α − r)ut]dt + σutdWt

– Typeset by FoilT EX – 6

slide-7
SLIDE 7

Moral

  • These types of problems are not time consistent.
  • We cannot use DynP.
  • In fact, in these cases it is unclear what we mean

by “optimality”. Possible ways out:

  • Easy way: Dismiss the problem as being silly.
  • Pre-commitment: Solve (somehow) the problem

P0,x0 and ignore the fact that later on, your “optimal” control will no longer be viewed as

  • ptimal.
  • Game

theory: Take the time inconsistency

  • seriously. View the problems as a game and look

for a Nash equilibrium point. We use the game theoretic approach.

– Typeset by FoilT EX – 7

slide-8
SLIDE 8

Our Basic Problem

max

u

Et,x [F(x, XT)] + G

  • x, Et,x [XT]
  • dXs

= µ(Xs, us)ds + σ(Xs, us)dWs, Xt = x This can be extended considerably. For simplicity we will consider the easier problem max

u

Et,x [F(XT)] + G

  • Et,x [XT]
  • – Typeset by FoilT

EX – 8

slide-9
SLIDE 9

The Game Theoretic Approach

  • This is a bit delicate to formalize in continuous time.
  • Thus we turn to discrete time, and then go to the

limit.

– Typeset by FoilT EX – 9

slide-10
SLIDE 10

Discrete Time

Given: A controlled Markov process {Xn : n = 0, 1, . . . T} At any time n we can change the transition probabilities for Xn → Xn+1 by choosing a control value u ∈ R. Players:

  • For each point in time n there is a player – “player

No n” or “Pn”.

  • Pn chooses the feedback control law un(Xn).
  • A sequence of control laws u0, . . . , uT −1 is denoted

by u.

  • Given a sequence u of control laws, the value

function for Pn is defined by Jn(x, u) = En,x [F(Xu

T)] + G

  • En,x [Xu

T]

  • – Typeset by FoilT

EX – 10

slide-11
SLIDE 11

Subgame Perfect Nash Equilibrium

The value function for Pn was defined by Jn(x, u) = En,x [F(Xu

T)] + G

  • En,x [Xu

T]

  • We

see that Jn(x, u) depends

  • n

(n, x) and un, un+1, . . . , uT −1. Definition:

  • The control law ˆ

u is an equilibrium strategy if the following hold for each fixed n. – Assume that Pk use ˆ uk(·) for k = n+1, . . . , T −1 . – Then it is optimal for player No n to use ˆ un(·).

  • The equilibrium value function is defined by

Vn(x) = Jn(x, ˆ u)

– Typeset by FoilT EX – 11

slide-12
SLIDE 12

The infinitesimal operator

Let {fn(·)}T

n=0 be a sequence of real valued functions.

Def: For a fixed control value u ∈ R, the infinitesimal operator Au, is defined by (Auf)n (x) = E [fn+1(Xn+1) − fn(x)| Xn = x, un = u] Def: For a fixed control law u, the Au, is defined by (Auf)n (x) = E [fn+1(Xn+1) − fn(x)| Xn = x, un = un(x)]

– Typeset by FoilT EX – 12

slide-13
SLIDE 13

Important Idea

It turns out that a fundamental role is played by the function sequence fn defined by fn(x) = En,x

u T

  • where ˆ

u is the equilibrium strategy. The process fn(Xn) is of course a martingale under the equilibrium control ˆ u so we have Aˆ

ufn(x)

= 0, fT(x) = x.

– Typeset by FoilT EX – 13

slide-14
SLIDE 14

Extending HJB

Proposition: The equilibrium value function Vn(x) and the function fn(x) satisfy the system sup

u {AuVn(x) − Au (G ◦ f)n (x) + (Huf)n (x)}

= 0, VT(x) = F(x) + G(x) Aˆ

ufn(x)

= 0, fT(x) = x. (Huf)n (x) = G

  • En,x
  • fn+1
  • Xu

n+1

  • − G (fn(x)) ,

fn(x) = En,x

  • X ˆ

u T

  • – Typeset by FoilT

EX – 14

slide-15
SLIDE 15

Continuous Time

The discrete time results extend immediately to continuous time.

  • Now X is a controlled continuous time Markov

process with controlled infinitesimal generator Aug(t, x) = lim

h→0

1 h

  • Et,x
  • g(t + h, Xu

t+h)

  • − g(t, x)
  • The extended HJB is now an equation with time

step [t, t + h].

  • Divide the discrete time HJB equations by h and let

h → 0.

– Typeset by FoilT EX – 15

slide-16
SLIDE 16

Extended HJB Continuous Time

Conjecture: The equilibrium value function satisfies the system sup

u {AuV (t, x) − Au (G ◦ f) (t, x) + G′ (f(t, x)) · Auf(t, x)}

= 0, Aˆ

uf(t, x)

= 0, V (T, x) = F(x) + G(x) f(T, x) = x. Note the fixed point character of the extended HJB.

– Typeset by FoilT EX – 16

slide-17
SLIDE 17

General Problem

max

u

Et,x T

t

C(t, x, Xs, us)ds + F(t, x, XT)

  • + G
  • t, x, Et,x [XT]
  • – Typeset by FoilT

EX – 17

slide-18
SLIDE 18

The general case

sup u∈U { “ AuV ” (t, x) + C(x, x, u) − Z T t (Aucs)t(x, x)ds + Z T t (Aucs,x)t(x)ds − “ Auf ” (t, x, x) + “ Aufx” (t, x) − Au (G ⋄ g) (t, x) + “ Hug ” (t, x)} = 0, Aˆ ufy(t, x) = 0, Aˆ ug(t, x) = 0, (Aˆ ucs,y)t(x) = 0, 0 ≤ t ≤ s V (T, x) = F (x, x) + G(x, x), cs,y s (x) = C(x, y, ˆ us(x)), f(T, x, y) = F (y, x), g(T, x) = x. – Typeset by FoilT EX – 18

slide-19
SLIDE 19

Optimal for what?

  • In continuous time, it is not immediately clear how

to define an equilibrium strategy.

  • We follow Ekeland et al and define the equilibrium

using spike variations.

– Typeset by FoilT EX – 19

slide-20
SLIDE 20

HJB as a Necessary Condition

Conjecture: Assume that there exists and equilibrium control ˆ u and define V and f as above. The V and f satisfies the extended HJB system. Note: It is probably very hard to prove this, due to technical problems. We do however have a converse result.

– Typeset by FoilT EX – 20

slide-21
SLIDE 21

Verification Theorem

Theorem: Assume that V , f and ˆ u satisfies the extended HJB system. Then V is the equilibrium value function and ˆ u is the equilibrium control. Proof: Not very hard, but a bit harder than for standard DynP.

– Typeset by FoilT EX – 21

slide-22
SLIDE 22

A useful Lemma

Consider a functional J(t, x, u) = Et,x [F(x, Xu

T)] + G (x, Et,x [Xu T]) .

and denote the equilibrium control and value function by ˆ u and V respectively. Let ϕ(x() be a given deterministic real valued function and consider the functional Jϕ(t, x, u) = ϕ(x) {Et,x [F(x, Xu

T)] + G (x, Et,x [Xu T])}

Denoting the equilibrium control and value function by ˆ uϕ and V ϕ respectively we have ˆ uϕ(t, x) = ˆ u(t, x), V ϕ(t, x) = ϕ(x)V (t, x)

– Typeset by FoilT EX – 22

slide-23
SLIDE 23

Practical handling of the theory

  • Make a parameterized Ansatz for V .
  • Make a parameterized Ansatz for f.
  • Plug everything into the extended HJB system and

hope to obtain a system of ODEs for the parameters in the Ansatz.

  • Alternatively, compute Lie symmetry groups.

– Typeset by FoilT EX – 23

slide-24
SLIDE 24

Basak’s Example (in a simple version)

dSt = αStdt + σStdWt, dBt = rBtdt Xt = portfolio value process u = amount of money invested in risky asset Problem: max

u

Et,x [XT] − γ 2V art,x (XT) dXt = [rXt + (α − r)ut]dt + σutdWt This corresponds to our standard problem with F(x) = x − γ 2x2, G(x) = γ 2x2

– Typeset by FoilT EX – 24

slide-25
SLIDE 25

Extended HJB

Vt + sup

u

  • [rXt + (α − r)u]Vx + 1

2σ2u2Vxx − γ 2σ2u2f 2

x

  • =

V (T, x) = x Aˆ

uf

= f(T, x) = x Ansatz: V (t, x) = g(t)x + h(t) f(t, x) = A(t)x + B(t)

– Typeset by FoilT EX – 25

slide-26
SLIDE 26

Result

The equilibrium value function and strategy are given by V (t, x) = er(T −t)x + 1 2γ (α − r)2 σ2 (T − t) ˆ u(t, x) = 1 γ α − r σ2 e−r(T −t) f(t, x) = er(T −t)x + 1 γ (α − r)2 σ2 (T − t)

– Typeset by FoilT EX – 26

slide-27
SLIDE 27

A Closer Look at Naive Mean Variance

We recall that for the Basak problem we have ˆ u(t, x) = 1 γ α − r σ2 e−r(T −t) where u is the dollar amount invested in the risky asset. Is this economically meaningful?

– Typeset by FoilT EX – 27

slide-28
SLIDE 28

NO!

– Typeset by FoilT EX – 28

slide-29
SLIDE 29

A Closer Look at Naive Mean Variance

We recall that for the Basak problem we have ˆ u(t, x) = 1 γ α − r σ2 e−r(T −t)

  • The control u is the number of dollars invested in

the risky asset. In the Basak case this is independent

  • f the level of wealth.
  • You thus invest the same number of dollars in the

risky asset regardless of whether your wealth is 100 dollars or 10 billion dollars.

  • Not so realistic.

– Typeset by FoilT EX – 29

slide-30
SLIDE 30

Realistic Mean Variance

Idea: We let the risk aversion coefficient γ depend on current wealth. max

u

Et,x [XT] − γ(x) 2 V art,x (XT) dXt = [rXt + (α − r)ut]dt + σutdWt This case is covered by our general theory

– Typeset by FoilT EX – 30

slide-31
SLIDE 31

General Solution

The equilibrium control is given by ˆ u(t, x) = − β σ2 fx(t, x, x) + γ(x)g(t, x)gx(t, x) fxx(t, x, x) + γ(x)g(t, x)gxx(t, x). The functions f and g are determined by the system ft(t, x, y) + (rx + βˆ u) fx(t, x, y) + 1 2σ2ˆ u2fxx(t, x, y) = 0, gt(t, x) + (rx + βˆ u) gx(t, x) + 1 2σ2ˆ u2gxx(t, x) = 0, with boundary conditions f(T, x, y) = x − γ(y) 2 x2, g(T, x) = x. The equilibrium value function V is given by V (t, x) = f(t, x, x) + γ(x) 2 g2(t, x).

– Typeset by FoilT EX – 31

slide-32
SLIDE 32

A natural choice of γ(x)

  • 1. Dimension analysis:

J(t, x, u) = Et,x [Xu

T] − γ(x)

2 V art,x [Xu

T]

  • The term Et,x [Xu

T] has the dimension (dollar).

  • The term V art,x [Xu

T] has the dimension (dollar)2.

  • We we have to choose γ in such a way that γ(x)

has the dimension (dollar)−1.

  • The most obvious way to accomplish this is of course

to specify γ as γ(x) = γ x,

– Typeset by FoilT EX – 32

slide-33
SLIDE 33

A natural choice of γ(x)

  • 2. Back to basics.
  • The original mean variance problem is posed in

terms of returns: J(t, x, u) = Et,x Xu

T

x

  • − γ(x)

2 V art,x Xu

T

x

  • We can write this as

J(t, x, u) = 1 x

  • Et,x [Xu

T] − γ

2xV art,x [Xu

T]

  • It now follows from previous Lemma that this will

lead to the same equilibrium control J(t, x, u) = Et,x [Xu

T] − γ(x)

2 V art,x [Xu

T]

with γ(x) = γ x.

– Typeset by FoilT EX – 33

slide-34
SLIDE 34

The Case γ(x) = 1/x

The equilibrium control is given by ˆ u(t, x) = c(t)x where c solves the integral equation c(t) = β γσ2

  • e−

R T

t [r+βc(s)+σ2c2(s)]ds + γe−

R T

t σ2c2 sds

This equation has a unique solution and we provide a convergent numerical algorithm to solve it.

– Typeset by FoilT EX – 34

slide-35
SLIDE 35

Open Questions

  • Martingale approach to equilibrium control?
  • Convex duality theory?
  • More examples.
  • Existence and uniqueness of the extended HJB

system.

  • Extension to optimal stopping.

– Typeset by FoilT EX – 35

slide-36
SLIDE 36

Optimal for what?

In continuous time, it is not immediately clear how to define an equilibrium strategy. We follow Ekeland et al.

  • Consider a fixed control law u.
  • Fix (t, x) and a “small” time increment h.
  • Choose an arbitrary real number u.
  • Consider the control law uh(t, x) defined by

uh(s, y) = ˆ u(s, y) for t + h ≤ s ≤ T u for t ≤ s ≤ t + h Def: The control law ˆ u is an equilibrium control if lim

h→0

J (t, x, ˆ u) − J (t, x, uh) h ≥ 0 for all choices of t, x, h, u.

– Typeset by FoilT EX – 36

slide-37
SLIDE 37

Connection to Standard Problems

sup

u {AuV (t, x) − Au (G ◦ f) (t, x) + G′ (f(t, x)) · Auf(t, x)} = 0,

  • Assume that we know the equilibrium strategy ˆ

u.

  • Since ft(x) = Et,x

u T

  • , we can now compute f
  • Now define the function h(t, x, u) by

h(t, x, u) = (Huf) (t, x) − Au (G ◦ f) (t, x) The extended HJB takes the form sup

u {AuV (t, x) + h(t, x, u)} = 0,

V (T, x) = F(x) + G(x)

– Typeset by FoilT EX – 37

slide-38
SLIDE 38

Equivalent Standard Problems

We obtained sup

u {AuV (t, x) + h(t, x, u)}

= 0, V (T, x) = F(x) + G(x) This is the HJB for the time consistent problem max

u

Et,x T

t

h(s, Xs, us)dt + F(XT) + G(XT)

  • – Typeset by FoilT

EX – 38

slide-39
SLIDE 39

Equivalent Standard Problem

The Basak problem has the same optimal control as the time consistent problem max

u

Et,x

  • XT − γσ2

2 T

t

e2r(T −s)u2

sds

  • dXt = [rXt + (α − r)ut]dt + σutdWt

We note in passing that σ2u2

tdt = dXt

– Typeset by FoilT EX – 39