The separation principle in stochastic control, revisited Workshop - PowerPoint PPT Presentation

The separation principle in stochastic control, revisited Workshop in honor of Eduardo Sontag on the occasion of his 60th birthday Tryphon T. Georgiou joint work with Anders Lindquist

w y u linear stochastic system � π dx = A ( t ) x ( t ) dt + B 1 ( t ) u ( t ) dt + B 2 ( t ) dw dy = C ( t ) x ( t ) dt + D ( t ) dw w ( t ) is a vector-valued Wiener process x (0) is a Gaussian random vector independent of w ( t ) , y (0) = 0 A , B 1 , B 2 , C , D are matrix-valued functions Goal: Design nonanticipatory control π : y �→ u that minimizes �� T � � T x ( t ) ′ Q ( t ) x ( t ) dt + u ( t ) ′ R ( t ) u ( t ) dt + x ( T ) ′ Sx ( T ) J ( u ) = E 0 0

separation priniciple under suitable assumptions on the class of admissible control π : y �→ u , the “optimal control” is u ( t ) = K ( t )ˆ x ( t ) where ˆ x ( t ) = E { x ( t ) | Y t } , d ˆ x = A ( t )ˆ x ( t ) dt + B 1 ( t ) u ( t ) dt + L ( t )( dy − C ( t )ˆ x ( t ) dt ) x (0) = 0 . ˆ with K ( t ) and L ( t ) computed via a pair of dual Riccati equations NB: — attempts to prove separation for u ( t ) is Y t measurable (a.s.). . . — too big a class; we know no proof which is correct (strong solutions) 3

historical remarks Wonham, Kushner, Lindquist, Fleming & Rishel • treatment overburdened with technicalities • folk accounts not supported by existing proofs • non-Gaussian nature due to an a-priori nonlinear π is often overlooked • herein, separation principle for: w – the most natural class of controls y u all linear/nonlinear and even discontinuous such that feedback loop makes “engineering” sense – engineering view point: signals = sample functions π – general semimartingale driving noise, with jumps – delay-differential linear systems, etc. 4

the standard “completion of squares” � � � T � T x (0) ′ P (0) x (0) + ( u − Kx ) ′ R ( u − Kx ) dt tr( B ′ J ( u ) = E + 2 PB 2 ) dt 0 0 where � ˙ P = − A ′ P − PA + PB 1 R − 1 B ′ 1 P − Q P ( T ) = S K ( t ) := − R ( t ) − 1 B 1 ( t ) ′ P ( t ) . using Itˆ o’s rule: d ( x ′ Px ) = x ′ ˙ Pxdt + 2 x ′ Pdx + tr( B ′ 2 PB 2 ) dt = [ − x ′ Qx − u ′ Ru + ( u − Kx ) ′ R ( u − Kx ) + tr( B ′ 2 PB 2 )] dt + 2 x ′ PB 2 dv with “complete state-information”: u optimal ( t ) = K ( t ) x ( t ) 5

incomplete state information u ( t ) needs to be a function of { y ( s ); 0 ≤ s ≤ t } Standard recipe: u ( t ) = K ( t )ˆ x ( t ) where x ( t ) = E { x ( t ) | Y t } ˆ justification ⇔ separation theorem 6

where is the potential problem? set x ( t ) := x ( t ) − ˆ ˜ x ( t ) then � T � T ( u − Kx ) ′ R ( u − Kx ) dt = E x ) ′ R ( u − K ˆ x )] dt +tr( K ′ RK Σ) [( u − K ˆ E 0 0 x ( t ) ′ } = 0 , since E { [ u ( t ) − K ( t )ˆ x ( t )]˜ x ( t ) ′ } and where Σ( t ) := E { ˜ x ( t )˜ why isn’t obvious that u = K ˆ x is optimal? subtlety: in general, Σ may depend on the control 7

source of fallacy (?) due to linearity � t x ( t ) = x 0 ( t ) + Φ( t, s ) B 1 ( s ) u ( s ) ds 0 the control term cancels out: x ( t ) = ˜ ˜ x 0 ( t ) := x 0 ( t ) − ˆ x 0 ( t ) , where ˆ x 0 ( t ) := E { x 0 ( t ) | Y t } x 0 ( t ) ′ } depend on the control? how could E { ˜ x 0 ( t )˜ because the filtration Y t , and hence ˆ x 0 , might depend on u ! — u is in general a nonlinear function of y — hence, y may not be Gaussian — despite the fact that x 0 is Gaussian, x 0 ( t ) = E { x 0 ( t ) | Y t } may not be linear in the data { y ( τ ); τ ∈ [0 , t ] } ˆ — ˆ x 0 ( t ) may not be given by a Kalman filter. 8

generalization - notation z 0 + + u z y g H π � t z ( t ) = z 0 ( t ) + 0 G ( t, τ ) u ( τ ) dτ y ( t ) = Hz ( t ) where � t g : ( t, u ) �→ G ( t, τ ) u ( τ ) dτ 0 � � x ( t ) E.g., z ( t ) = and H = [0 , I ] y ( t ) 9

ways out (?) SOL: stochastic open loop (Lindquist) limit control so as to be adapted to {Y 0 t } z 0 + y 0 u z 0 y + z g H π H examples — linear control — Lipschitz feedback 10

e.g., control adapted to {Y 0 t } via z 0 + y 0 u y + z g g π H − g H 11

example: linear feedback � t u ( t ) = u deterministic + F ( t, τ ) dy 0 then the Gaussian character is preserved. It can be shown that Y t = Y 0 t . Hence, d ˜ x = ( A − LC )˜ xdt + ( B 2 − LD ) dw x (0) = x (0) ˜ x ( t ) ′ } is independent of u Σ( t ) := E { ˜ x ( t )˜ 12

� t � t u ( t ) = F ( t, τ ) dy ( τ ) ⇒ dy = dy 0 + M ( t, s ) u ( s ) dsdt 0 0 � t ⇒ dy = dy 0 + N ( t, τ ) dy ( τ ) dt 0 � t N ( t, τ ) = τ M ( t, s ) F ( s, τ ) ds where � t R ( t, τ ) = τ R ( t, s ) N ( s, τ ) ds + N ( t, s ) Volterra resolvent Then � t � t N ( t, τ ) dy ( τ ) = R ( t, τ ) dy 0 ( τ ) 0 0 � t ⇒ dy = dy 0 + R ( t, τ ) dy 0 ( τ ) dt 0 ⇒ σ { y ( τ ); 0 ≤ τ ≤ t } = σ { y 0 ( τ ); 0 ≤ τ ≤ t } 13

example: Lipschitz continuous control [Wonham] Assuming that dy ( t ) = x ( t ) dt + D ( t ) dw ( t ) i.e., C ( t ) = I is invertible! Then among control laws of the form u ( t ) = ψ ( t, ˆ x ( t )) the choice u ( t ) = K ( t )ˆ x ( t ) is optimal. [Fleming & Rishel] removed the assumption on C ( t ) ; Lipschitz on y ; simpler proof. 14

example: Lipschitz (cont.) ˆ ξ 0 ( t ) := E { x 0 ( t ) | Y 0 [Kushner] t } given by the Kalman filter d ˆ ξ 0 = A ˆ ξ 0 ( t ) dt + L ( t ) dv 0 , ˆ ξ 0 (0) = 0 dv 0 = dy 0 − C ˆ ξ 0 ( t ) dt, v 0 (0) = 0 define � t ξ ( t ) := ˆ ˆ ξ 0 ( t ) + Φ( t, s ) B 1 ( s ) u ( s ) ds 0 and assume u ( t ) = ψ ( t, ˆ ξ ( t )) is Lipschitz Then ˆ ξ is the unique strong solution of � � d ˆ A ˆ ξ ( t ) + B 1 ψ ( t, ˆ dt + L ( t ) dv 0 , ˆ ξ = ξ ( t )) ξ (0) = 0 . ˆ This choice force u to be adapted to {Y 0 t } ⇒ {Y 0 t } = {Y t } ⇒ ξ = ˆ x 15

example: delay in the loop when u ( t ) is a function of y ( τ ); 0 ≤ τ ≤ t − ε , Y t = Y 0 t the possibility of a control-dependent σ -field does not arise in the usual (predictive) discrete-time formulation — Taking ǫ → 0 and general nonlinear feedback there is no guarantee that Y t is left-continuous — “Proofs” of separation using such limits are circular, misleading accounts in textbooks. 16

signals and systems signals : sample paths; possibly having bounded discontinuities in D (c` adl` ag – Skorokhod space) systems : measurable nonanticipatory maps examples: i) SDE’s that have strong solutions ii) nonlinearities, hysteresis ( C → D ), etc. h ( z ) 1 ǫ z z ( t ) → h ( z ( t )) 17

well-posedness of feedback Defn. a feedback loop, that is z = z 0 + f ( z ) is well-posed z 0 (1 − f ) − 1 if it has a unique solution in D for all z 0 ∈ D + + and (1 − f ) − 1 is a system. z h low pass � �� f h ( z ) 1 ǫ z z ( t ) = (1 − f ) − 1 z 0 ( t ) z 0 ( t ) 18

well-posedness (cont.) by defn z, z 0 stochastic processes well-posedness implies that z 0 + + Z 0 t = Z t , t ∈ [0 , T ] . z (1 − f ) and (1 − f ) − 1 are systems f ⇒ z 0 = z − f ( z ) and z = (1 − f ) − 1 z 0 NB. — no more information other than what is contained in Z 0 t 19

how about incomplete state-information? z y H � � � � 0 w z 1 = , z 2 = 0 w generate the same filtrations, i.e., Z 1 t = Z 2 t � 1 0 � while for H = , � 1 0 � � � � 1 0 � � � w 0 y 1 = y 2 = , 0 w do not, i.e., Y 1 t � = Y 2 t . 20

z 0 linear read-out map + + u z y g H π Assume z ( t ) = z 0 ( t ) + g ◦ π ( y ( t )) y ( t ) = Hz ( t ) is well-posed with H linear, it follows that Y t = Y 0 t ∈ [0 , T ] . t , 21

(1 − Hgπ ) H = H − HgπH Proof: = H (1 − gπH ) H (1 − gπH ) − 1 = (1 − Hgπ ) − 1 H ⇒ y = (1 − Hgπ ) − 1 y 0 , and y 0 = (1 − Hgπ ) y . 22

essence of the lemma well-posedness resolves the issue of circular control dependence z 0 z 0 H + + u z y g + H u y ≃ z g H + π π 23

the separation principle thm: assuming � dx = A ( t ) x ( t ) dt + B 1 ( t ) u ( t ) dt + B 2 ( t ) dw dy = C ( t ) x ( t ) dt + D ( t ) dw w ( t ) is a vector-valued Wiener process x (0) is a Gaussian random vector independent of w ( t ) , y (0) = 0 A , B 1 , B 2 , C , D are matrix-valued functions there is a unique control law π : y �→ u minimizing �� T � � T x ( t ) ′ Q ( t ) x ( t ) dt + u ( t ) ′ R ( t ) u ( t ) dt + x ( T ) ′ Sx ( T ) J ( u ) = E 0 0 in the class of well-posed control laws, and has the form u ( t ) = K ( t )ˆ x ( t ) 24

the separation principle (general) thm: for the same linear system, assuming w is a semimartingale and x (0) an independent random vector the unique optimal control in the class of well-posed controllers is given by u ( t ) = K ( t )ˆ x ( t ) where ˆ x is the conditional mean. remarks: no need for Lipschitz continuity allows jump processes K ( t ) is still given by a Riccati equation in general, the difficult part is constructing ˆ x ( t ) = E { x ( t ) |Y t } . 25

The separation principle in stochastic control, revisited Workshop - PowerPoint PPT Presentation

The separation principle in stochastic control, revisited Workshop in honor of Eduardo Sontag on the occasion of his 60th birthday Tryphon T. Georgiou joint work with Anders Lindquist w y u linear stochastic system dx = A ( t ) x

Text/Graphics Separation Revisited Karl Tombre, Salvatore Tabbone, Loc Plissier, Bart

Separation energies A = 21 isobaric chain one-nucleon separation energies two-nucleon separation

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Decomposition Announcements Modular Design Separation of Concerns 4 Separation of Concerns A

Class 14 Slides SLIDE what is the designing principle how does designing principle

Control over Gaussian Channels With and Setting Without SourceChannel Separation Background

Reducing Extraneous Processing Modality Principle Jan L. Plass, ECT Coherence Principle

End-to-End principle End-to-end Principle Broad networking principle First implementation

2/2/2015 FUNDAMENTAL LEGAL PRINCIPLES Principle of Indemnity Principle of Insurable

Extending propositional separation logic for robustness properties of separation logic Alessio

Temporary Read-Only Permissions for Separation Logic Making Separation Logics Small Axioms

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

What If We Only Have Stochastic . . . What if the Stochastic . . . Approximate Stochastic

Finance, Insurance, and Stochastic Control (II) Jin Ma Spring School on Stochastic Control in

Finance, Insurance, and Stochastic Control (IV) Jin Ma Spring School on Stochastic Control in

Finance, Insurance, and Stochastic Control (III) Jin Ma Spring School on Stochastic Control in

The Separation Theorem for Differential Interaction Nets Damiano Mazza Laboratoire

Course Summary Course Summary Introduction: Introduction: Basic problems and questions in

Nearest neighbor classifier Information retrieval 8 ! Databases, systems, networking 4 ! Subhransu

Chapter 4: Variability Variability Provides a quantitative measure of the degree to which

7. Separating Hyperplane Theorems I Daisuke Oyama Mathematics II May 1, 2020 Separating

Linear algebra and analysis recalls Lectures for PHD course on Numerical optimization Enrico

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

BN Semantics 3 Now its personal! Graphical Models 10708 Carlos Guestrin Carnegie