On the relation between certain stochastic control problems and - - PowerPoint PPT Presentation

on the relation between certain stochastic control
SMART_READER_LITE
LIVE PREVIEW

On the relation between certain stochastic control problems and - - PowerPoint PPT Presentation

On the relation between certain stochastic control problems and probabilistic inference Manfred Opper Computer Science 1 Control problem inference problem (Bert Kappen & Ema- Bellman nuel Todorov) Inference


slide-1
SLIDE 1

On the relation between certain stochastic control problems and probabilistic inference

Manfred Opper Computer Science

1

slide-2
SLIDE 2
  • Control problem

− →

  • Bellman

inference problem (Bert Kappen & Ema- nuel Todorov)

  • Inference problem

− →

  • Var method

control problems

  • Things that could be learnt from this and possible extensions

2

slide-3
SLIDE 3

Discrete times: MDPS

  • Assume Markov process with transition probabilities q(x′|x, u) tuned

by a ’control’ variable u.

  • Try to minimise total expected costs

V0(x) =

T

  • t=0

Eu [Lt(Xt, ut)|x0 = x]

  • Define ’Value’ of state x

Vt(x) =

  • τ≥t

Eu [Lτ(Xτ, uτ)|Xt = x]

  • Solution via Bellman equation

Vt(x) = min

u

  Lt(x, u) +

  • x′

qt(x′|x, u)Vt+∆t(x′)

  

3

slide-4
SLIDE 4

Continuous time: SDEs

  • (Ito) stochastic differential equation for state X(t) ∈ Rd

dX(t) = (u(Xt, t) + f(X(t))

  • Drift

dt + D1/2(X(t))

  • Diffusion

dW(t) W(t) vector of independent Wiener processes.

  • Limit of discrete time process Xk

∆Xk ≡ Xk+1 − Xk = (ut + f(Xk))∆t + D1/2(Xk) √ ∆t ǫk . ǫk i.i.d. Gaussian.

4

slide-5
SLIDE 5

Continuous time ctd

  • Try to minimise total expected costs

V0(x) =

T

t=0 Eu [Lt(X(t), u(t))|X(0) = x]

  • Define ’Value’ of state x

Vt(x) =

T

t

Eu [Ls(X(s), u(s))|X(t) = x] ds

  • Solution via Hamilton - Jacobi - Bellman equation

−∂Vt(x) ∂t = min

u

  • Lt(x, u) + (u + f)⊤∇Vt(x) + 1

2Tr(D∇⊤∇)Vt(x)

  • 5
slide-6
SLIDE 6
  • Specialise to

Lt(x, u) = 1 2u(t)⊤Ru(t) + U(x(t), t)

  • Minimisation leads to ut = −R−1∇Vt
  • and a a nonlinear PDE!

−∂Vt(x) ∂t = −1 2(∇Vt)⊤R−1(∇Vt) + f⊤(x)∇Vt +1 2Tr(D∇⊤∇)Vt + U(x, t)

6

slide-7
SLIDE 7

Exact Linearisation (Kappen, 2005)

  • Assume that D = R−1 and using the transformation Vt(x) = − ln Zt(x)

we get the equation

∂t + L†

  • Zt(x) = 0

with the linear operator L† = f⊤∇ + 1 2Tr(D∇⊤∇) − U(x, t)

  • and a path integral representation

Zt(x) = Eu=0

  • e− T

t Uτ(X(τ),τ) dτ|X(t) = x

  • Now all kinds of inference tricks apply!

7

slide-8
SLIDE 8

Todorov’s solvable MDPs (2006)

Lt(x, u) =

  • x′

q(x′|x, u) ln q(x′|x, u) p(x′|x) + Ut(x) Bellman equation Vt(x) = min

u

  Ut(x) +

  • x′

q(x′|x, u) ln

  • q(x′|x, u)

p(x′|x) + Vt+∆t(x′)

  

The controlled transition probabilities: q(x′|x, u) ∝ p(x′|x) e−Vt+∆t(x′) with Vt(x) = − ln Zt(x) we get the linear equation (Todorov, 2005) Zt(x) = e−Ut(x)

x′

p(x′|x) Zt+∆t(x′)

8

slide-9
SLIDE 9

Relation to continuous case

  • Short time transition probability

p

  • x′, t + ∆t|x, t
  • ∝ exp
  • − 1

2∆t ∆x − f(x)∆t2

D

  • as ∆t → 0, with F2

D = F ⊤D−1F.

  • Let pg and pf short term transition probabilities for Diffusion pro-

cesses with drift g and f with same diffusion D. Then

  • pg
  • x′, t + ∆t|x, t
  • ln pg

x′, t + ∆t|x, t

  • pf (x′, t + ∆t|x, t) dx′ =

≃ 1 2 g(x, t) − f(x, t)2

D ∆t

9

slide-10
SLIDE 10

The KL divergence for Markov processes

Consider probabilities p(X0:T), q(X0:T) over entire paths X0:T . The total KL divergence .... KL [q(X0:T)p(X0:T)] =

  • dx0:T q(x0:T) ln q(x0:T)

p(x0:T) =

T−1

  • k=1
  • dxk q(xk)
  • dxk+1 q(xk+1|xk) ln q(xk+1|xk)

p(xk+1|xk) =

T−1

  • k=1
  • dxk q(xk) KL [q(·|Xk)p(·|Xk)]

.... is the expected sum of KLs for transition probabilities.

10

slide-11
SLIDE 11

The global solution

The Kappen / Todorov control problems are of the form: Minimise the Variational free energy Vt(x) = KL [q(Xt:T)p(Xt:T)] + Eq[

  • τ≥t

Uτ(Xτ, τ)] for fixed Xt = x with respect to q. The optimal controlled probability

  • ver paths is

q∗(Xt:T) = 1 Zt(x) p(Xt:T) e−

τ≥t Uτ(Xτ,τ)

with the minimal cost (free energy) Vt(x) = − ln Zt(x) = − ln Ep

  • e−

τ≥t Uτ(Xτ,τ)|Xt = x

  • This looks like a HMM with ’likelihood’ e−

τ≥t Uτ(Xτ,τ). 11

slide-12
SLIDE 12

Comments

  • For HMMs

Zt(x) = Ep

  • e−

τ≥t Uτ(Xτ,τ)|Xt = x

  • ∝ P(future data|Xt = x)

fulfils a linear backward equation.

  • The posterior = controlled process has transition probabilities

q(xt+1|xt) p(xt+1|xt) = Zt+1(xt+1) Zt(xt) e−Ut(xt,t)

  • Similar things happen for the continuous case:

∂tf⊤∇ + 1 2Tr(D∇⊤∇) − U(x, t)

  • Zt(x) = 0
  • Posterior is a diffusion with ’controlled’ drift ut(x) = D∇ ln Zt(x).

12

slide-13
SLIDE 13

A ’real’ likelihood for continuous time paths

Consider an inhomogeneous Poisson process with rate function U(Xt). Then Pr {No event ∈ [0 T]} = e− T

t Us(Xs,s) ds 13

slide-14
SLIDE 14

Application: Simulate diffusions with constraints

Wiener process with fixed endpoints x(t = T) = 0

14

slide-15
SLIDE 15

Solution

ut(x) = ∂ ln Zt(x) ∂x ∂Zt(x) ∂t + 1 2 ∂2Zt(x) ∂x2 = 0 Zt(x) = δ(x) is solved by Zt(x) ∝ e−

x2 2(T−t)

and leads to ut(x) = − x T − t for 0 < t < T.

15

slide-16
SLIDE 16

Diffusions with constraints on domain: X(t) ∈ Ω.

  • 1. Method I: Kill trajectory if X(t) ∈ ∂Ω for some t.
  • 2. Method II: Simulate SDEs with drift ut(x) = ∇ ln Zt(x) where

Zt = E

  • e−

τ≥t Uτ(Xτ,τ)|Xt = x

  • with U = ∞ if x /

∈ Ω and U = 0 else. Hence ∂Zt(x) ∂t + 1 2 ∂2Zt(x) ∂x2 = 0 with Zt(x) = 0 for X(t) ∈ ∂Ω.

16

slide-17
SLIDE 17

Possible approximations if we haven’t got KL losses ?

  • Approximate solution to control problem

V0(x) =

T

t=0 Eu

1

2u(t)⊤Ru(t) + U(x(t), t)|X(0) = x

  • for general matrix R.
  • Gaussian measure over paths X0:T induced by linear (approximate)

posterior SDE (Archambeau, Cornford, Opper & Shawe - Taylor, 2007) dX(t) = {−A(t)X + b(t)} dt + D1/2dW as an approximation to dX(t) = {u(X, T) + f(X)} dt + D1/2dW Replace u(X, T) ≈ −A(t)X + b(t) − f(X) → nonlinear ODEs for moments instead of linear PDEs !

17

slide-18
SLIDE 18

Possible extensions to other losses

Simple non KL losses KL(qp) → αKL(qp1) − βKL(qp2) with α, β > 0. Use iterative method (CCCP style) upper bounding −KL(qp2) ≤ −Eq[ln qn p2 ] where qn is the present optimiser.

18