on the relation between certain stochastic control
play

On the relation between certain stochastic control problems and - PowerPoint PPT Presentation

On the relation between certain stochastic control problems and probabilistic inference Manfred Opper Computer Science 1 Control problem inference problem (Bert Kappen & Ema- Bellman nuel Todorov) Inference


  1. On the relation between certain stochastic control problems and probabilistic inference Manfred Opper Computer Science 1

  2. • Control problem − → inference problem (Bert Kappen & Ema- ���� Bellman nuel Todorov) • Inference problem − → control problems ���� Var method • Things that could be learnt from this and possible extensions 2

  3. Discrete times: MDPS • Assume Markov process with transition probabilities q ( x ′ | x, u ) tuned by a ’control’ variable u . • Try to minimise total expected costs T � E u [ L t ( X t , u t ) | x 0 = x ] V 0 ( x ) = t =0 • Define ’Value’ of state x � E u [ L τ ( X τ , u τ ) | X t = x ] V t ( x ) = τ ≥ t • Solution via Bellman equation     � q t ( x ′ | x, u ) V t +∆ t ( x ′ ) V t ( x ) = min  L t ( x, u ) + u  x ′ 3

  4. Continuous time: SDEs • (Ito) stochastic differential equation for state X ( t ) ∈ R d dt + D 1 / 2 ( X ( t )) dX ( t ) = ( u ( X t , t ) + f ( X ( t )) dW ( t ) � �� � � �� � Drift Diffusion W ( t ) vector of independent Wiener processes . • Limit of discrete time process X k √ ∆ X k ≡ X k +1 − X k = ( u t + f ( X k ))∆ t + D 1 / 2 ( X k ) ∆ t ǫ k . ǫ k i.i.d. Gaussian. 4

  5. Continuous time ctd • Try to minimise total expected costs � T t =0 E u [ L t ( X ( t ) , u ( t )) | X (0) = x ] V 0 ( x ) = • Define ’Value’ of state x � T E u [ L s ( X ( s ) , u ( s )) | X ( t ) = x ] ds V t ( x ) = t • Solution via Hamilton - Jacobi - Bellman equation � � − ∂V t ( x ) L t ( x, u ) + ( u + f ) ⊤ ∇ V t ( x ) + 1 2Tr( D ∇ ⊤ ∇ ) V t ( x ) = min u ∂t 5

  6. • Specialise to L t ( x, u ) = 1 2 u ( t ) ⊤ Ru ( t ) + U ( x ( t ) , t ) • Minimisation leads to u t = − R − 1 ∇ V t • and a a nonlinear PDE! − ∂V t ( x ) = − 1 2( ∇ V t ) ⊤ R − 1 ( ∇ V t ) + f ⊤ ( x ) ∇ V t ∂t +1 2Tr( D ∇ ⊤ ∇ ) V t + U ( x, t ) 6

  7. Exact Linearisation (Kappen, 2005) • Assume that D = R − 1 and using the transformation V t ( x ) = − ln Z t ( x ) we get the equation � ∂ � ∂t + L † Z t ( x ) = 0 with the linear operator L † = f ⊤ ∇ + 1 2Tr( D ∇ ⊤ ∇ ) − U ( x, t ) • and a path integral representation e − � T � � Z t ( x ) = E u =0 t U τ ( X ( τ ) ,τ ) dτ | X ( t ) = x Now all kinds of inference tricks apply! 7

  8. Todorov’s solvable MDPs (2006) q ( x ′ | x, u ) ln q ( x ′ | x, u ) � L t ( x, u ) = + U t ( x ) p ( x ′ | x ) x ′ Bellman equation  � � q ( x ′ | x, u )   � q ( x ′ | x, u ) ln + V t +∆ t ( x ′ ) V t ( x ) = min  U t ( x ) + p ( x ′ | x ) u  x ′ The controlled transition probabilities: q ( x ′ | x, u ) ∝ p ( x ′ | x ) e − V t +∆ t ( x ′ ) with V t ( x ) = − ln Z t ( x ) we get the linear equation (Todorov, 2005) Z t ( x ) = e − U t ( x ) � p ( x ′ | x ) Z t +∆ t ( x ′ ) x ′ 8

  9. Relation to continuous case • Short time transition probability � � − 1 � � x ′ , t + ∆ t | x, t 2∆ t � ∆ x − f ( x )∆ t � 2 p ∝ exp D as ∆ t → 0, with � F � 2 D = F ⊤ D − 1 F . • Let p g and p f short term transition probabilities for Diffusion pro- cesses with drift g and f with same diffusion D . Then � x ′ , t + ∆ t | x, t � � ln p g � � p f ( x ′ , t + ∆ t | x, t ) dx ′ = x ′ , t + ∆ t | x, t p g ≃ 1 2 � g ( x, t ) − f ( x, t ) � 2 D ∆ t 9

  10. The KL divergence for Markov processes Consider probabilities p ( X 0: T ), q ( X 0: T ) over entire paths X 0: T . The total KL divergence .... � dx 0: T q ( x 0: T ) ln q ( x 0: T ) KL [ q ( X 0: T ) � p ( X 0: T )] = p ( x 0: T ) T − 1 � � dx k +1 q ( x k +1 | x k ) ln q ( x k +1 | x k ) � = dx k q ( x k ) p ( x k +1 | x k ) k =1 T − 1 � � = dx k q ( x k ) KL [ q ( ·| X k ) � p ( ·| X k )] k =1 .... is the expected sum of KLs for transition probabilities. 10

  11. The global solution The Kappen / Todorov control problems are of the form: Minimise the Variational free energy � V t ( x ) = KL [ q ( X t : T ) � p ( X t : T )] + E q [ U τ ( X τ , τ )] τ ≥ t for fixed X t = x with respect to q . The optimal controlled probability over paths is 1 Z t ( x ) p ( X t : T ) e − � τ ≥ t U τ ( X τ ,τ ) q ∗ ( X t : T ) = with the minimal cost ( free energy ) � � e − � τ ≥ t U τ ( X τ ,τ ) | X t = x V t ( x ) = − ln Z t ( x ) = − ln E p This looks like a HMM with ’ likelihood ’ e − � τ ≥ t U τ ( X τ ,τ ) . 11

  12. Comments • For HMMs � � e − � τ ≥ t U τ ( X τ ,τ ) | X t = x Z t ( x ) = E p ∝ P (future data | X t = x ) fulfils a linear backward equation . • The posterior = controlled process has transition probabilities p ( x t +1 | x t ) = Z t +1 ( x t +1 ) q ( x t +1 | x t ) e − U t ( x t ,t ) Z t ( x t ) • Similar things happen for the continuous case: � ∂ � ∂tf ⊤ ∇ + 1 2Tr( D ∇ ⊤ ∇ ) − U ( x, t ) Z t ( x ) = 0 • Posterior is a diffusion with ’controlled’ drift u t ( x ) = D ∇ ln Z t ( x ). 12

  13. A ’real’ likelihood for continuous time paths Consider an inhomogeneous Poisson process with rate function U ( X t ). Then Pr { No event ∈ [0 T ] } = e − � T t U s ( X s ,s ) ds 13

  14. Application: Simulate diffusions with constraints Wiener process with fixed endpoints x ( t = T ) = 0 14

  15. Solution u t ( x ) = ∂ ln Z t ( x ) ∂x ∂ 2 Z t ( x ) ∂Z t ( x ) + 1 = 0 ∂x 2 2 ∂t Z t ( x ) = δ ( x ) is solved by x 2 Z t ( x ) ∝ e − 2( T − t ) and leads to x u t ( x ) = − T − t for 0 < t < T . 15

  16. Diffusions with constraints on domain: X ( t ) ∈ Ω. 1. Method I: Kill trajectory if X ( t ) ∈ ∂ Ω for some t . 2. Method II: Simulate SDEs with drift u t ( x ) = ∇ ln Z t ( x ) where � � e − � τ ≥ t U τ ( X τ ,τ ) | X t = x Z t = E with U = ∞ if x / ∈ Ω and U = 0 else. Hence ∂ 2 Z t ( x ) ∂Z t ( x ) + 1 = 0 ∂x 2 ∂t 2 with Z t ( x ) = 0 for X ( t ) ∈ ∂ Ω. 16

  17. Possible approximations if we haven’t got KL losses ? • Approximate solution to control problem � T � 1 � t =0 E u 2 u ( t ) ⊤ Ru ( t ) + U ( x ( t ) , t ) | X (0) = x V 0 ( x ) = for general matrix R . • Gaussian measure over paths X 0: T induced by linear (approximate) posterior SDE (Archambeau, Cornford, Opper & Shawe - Taylor, 2007) dX ( t ) = {− A ( t ) X + b ( t ) } dt + D 1 / 2 dW as an approximation to dX ( t ) = { u ( X, T ) + f ( X ) } dt + D 1 / 2 dW Replace u ( X, T ) ≈ − A ( t ) X + b ( t ) − f ( X ) → nonlinear ODEs for moments instead of linear PDEs ! 17

  18. Possible extensions to other losses Simple non KL losses KL ( q � p ) → αKL ( q � p 1 ) − βKL ( q � p 2 ) with α, β > 0. Use iterative method (CCCP style) upper bounding − KL ( q � p 2 ) ≤ − E q [ln q n ] p 2 where q n is the present optimiser. 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend