Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

Outline • Stochastic dynamic programming and linear quadratic control • Output feedback linear quadratic control • Separation principle • Kalman filter • LQG design

Stochastic formulation Stochastic disturbances w k ∈ R n w Dynamic model x k +1 = f k ( x k , u k , w k ) h − 1 X Cost g k ( x k , u k , w k ) + g h ( x h ) , k =0 Find a policy that minimizes u k = µ k ( x k ) π = { µ 0 , . . . , µ h − 1 } J π ( x 0 ) = E [ P h − 1 k =0 g k ( x k , µ k ( x k ) , w k ) + g h ( x h )] We assume that the stochastic disturbances have zero mean and are statistically independent (white) 1

Stochastic dynamic programming algorithm Start with for every and for each decision stage, starting J h ( x h ) = g h ( x h ) x h ∈ X h from the last and moving backwards, , compute and from k ∈ { h − 1 , h − 2 , . . . , 0 } J k µ k (DP equation) J k ( x k ) = u k ∈ U k ( x k ) E w k [ g k ( x k , u k , w k ) + J k +1 ( f k ( x k , u k , w k ))] min and µ k ( x k ) = u k where is the minimizer in the DP equation. u k Then is an optimal policy. { µ 0 , . . . , µ h − 1 } To be more precise we can write on the right hand side of the DP equation E w k [ g k ( x k , u k , w k ) + J k +1 ( f k ( x k , u k , w k )) | x k ] which reinforces the fact that is assumed to be constant while computing the expected x k value and taking the min. 2

Linear quadratic control zero mean E [ ω k ] = 0 and white Dynamic model x k +1 = A k x k + B k u k + ω k k ∈ { 0 , . . . , h − 1 } h − 1 ⇤  Q k �  x k � S k Cost function X ⇥ x | u | + x | h Q h x h k k S | R k u k k k =0 Find a policy which minimizes u k = µ k ( x k ) π = { µ 0 , . . . , µ h − 1 } h − 1 ⇤  Q k �  x k � S k X ⇥ x | u | + x | E [ h Q h x h ] k k S | R k u k k k =0 3

Optimal policy Theorem: The optimal policy is u k = K k x k k ∈ { 0 , . . . , h − 1 } where � − 1 � � � B | S | k + B | K k = − k P k +1 B k + R k k P k +1 A k � − 1 ( S | � P k = A | k P k +1 A k + Q k − ( S k + A | B | k + B | k P k +1 B k ) k P k +1 B k + R k k P k +1 A k ) P h = Q h k ∈ { h − 1 , . . . , 0 } (Riccati iterations) and the resulting expected cost is given by 0 P 0 x 0 + P h − 1 J 0 ( x 0 ) = x | k =0 trace( P k +1 E [ w k w | k ]) 4

Discussion • The optimal policy is exactly the same as in the case w k = 0 and to obtain the expected cost we just need to add a constant. • Recall the approaches to cope with stochastic disturbances • open loop: use the decisions of the optimal path • closed-loop: use the policy from deterministic DP • closed-loop: use the policy from stochastic DP • The result of the previous slide says that when the dynamic model is linear and the cost is quadratic, the deterministic DP and stochastic DP yield the same policy. • When this happens we say that we have certainty equivalence 5

Discussion • The result carries through if we consider an infinite horizon cost, under the same assumptions that we have considered in the deterministic case ∞ X x | k Qx k + 2 x | k Su k + u | E [ k Ru k ] k =0 � − 1 � (optimal policy B | ¯ S | + B | ¯ u k = ¯ ¯ � � Kx k K = − PB + R PA where satisfies the algebraic Riccati equation.) ¯ P • However, now we need the extra assumption that the cost is bounded. If this is the case (e.g. if ) it is given by w ` = 0 , ` > L > 0 0 ¯ k =0 trace( ¯ Px 0 + P ∞ x | P E [ w k w | k ]) • If the cost is not bounded we can consider an alternative (average) cost T − 1 1 X x | k Qx k + 2 x | k Su k + u | lim T E [ k Ru k ] T →∞ k =0 trace( ¯ which converges to , assuming PW ) W = E [ w k w | k ] ∀ k 6

Example Double integrator considered previously  τ 2  1 � � τ x k +1 = x k + 2 τ = 0 . 2 u k 0 1 τ  � 1 0 P h − 1 k =0 ( x | k Qx k + u | k Ru k ) + x | h Q h x h Q = R = 1 0 1 Optimal policy ⇥ − 0 . 8412 − 1 . 54 ⇤ u k = Kx k , k ≥ 0 K = x 0 = [ y 0 v 0 ] | = [1 0] | Optimal control input and states for 1 0.1 0.2 0 0 0.8 -0.1 -0.2 0.6 y(t) v(t) u(t) -0.2 -0.4 0.4 -0.3 -0.6 0.2 -0.4 -0.8 0 -0.5 -1 0 5 10 0 5 10 0 5 10 7 t t t

Effects of disturbances Consider now a disturbance at time ( k = 10) 2 sec  0 � 8  τ 2  1 � � , if k ∈ N 0 \{ 10 } > τ 0 > < x k +1 = x k + 2 + w k u k w k = 0 1  0 � τ > if k = 10 > : a where is a uniform random variable in the interval [ − 0 . 5 0 . 5] a 1 0 0.2 0.5 0 -0.1 Open loop 0 -0.2 -0.2 y(t) v(t) u(t) -0.5 -0.4 -0.3 -1 -0.6 -0.4 -1.5 -0.8 -2 -0.5 -1 0 10 20 0 10 20 0 10 20 t t t Closed loop 1 0.1 0.5 0.8 0 0.6 -0.1 0 y(t) v(t) u(t) 0.4 -0.2 0.2 -0.3 -0.5 0 -0.4 -0.2 -0.5 -1 8 0 10 20 0 10 20 0 10 20 t t t

Computing the expected cost We can compute the expected cost based on the expression  0 � 0 0 ¯ k =0 trace( ¯ 0 ¯ Px 0 + trace( ¯ Px 0 + P ∞ x | P E [ w k w | k ]) = x | ) P E [ a 2 ] 0 where is the solution the the algebraic Riccati equation which for the considered ¯ P parameters is  � 9 . 1890 5 . 0249 ¯ P = 5 . 0249 9 . 2324 and we used the fact that  � 0 8 , if k ∈ N 0 \{ 10 } > 0 > < w k =  � 0 > if k = 10 > : a Z 0 . 5 E [ a 2 ] = t 2 dt = 1 / 12 x 0 = [0 1] | Since, and we obtain the cost . 9 . 95 − 0 . 5 9

Computing the expected value Alternatively we can simulate the system several times computing the cost for each simulation and then averaging (Monte Carlo method) Simulation 1 1 0.1 0.5 0.5 0 0 -0.1 0 u(t) y(t) v(t) -0.5 -0.2 9 . 534 -1 -0.3 -0.5 -1.5 -0.4 -2 -0.5 -1 0 10 20 0 10 20 0 10 20 t t t Simulation 2 1 0.1 0.2 9 . 097 0.5 0 0 0 -0.1 -0.2 y(t) v(t) u(t) -0.5 -0.2 -0.4 -1 -0.3 -0.6 -1.5 -0.4 -0.8 -2 -0.5 -1 0 10 20 0 10 20 0 10 20 t t t average cost of 5000 simulations: 9 . 95 10

Outline • Stochastic dynamic programming and linear quadratic control • Output feedback linear quadratic control • Separation principle • Kalman filter • LQG design

Problem formulation Stochastic disturbances Dynamic model x k +1 = f k ( x k , u k , w k ) initial state unknown h − 1 X Cost g k ( x k , u k , w k ) + g h ( x h ) , noise k =0 Output y k = h k ( x k , u k − 1 , n k ) Information set I k = ( y 0 , y 1 , . . . , y k , u 0 , u 1 , . . . , u k − 1 ) I 0 = ( y 0 ) k ≥ 1 , , Find a policy which minimizes π = { µ 0 , . . . , µ h − 1 } u k = µ k ( I k ) h − 1 X J π = E [ g k ( x k , µ k ( I k ) , w k ) + g h ( x k )] k =0 We assume that the initial state, the stochastic disturbances and the output noise have zero mean, are statistically independent and have a known probability distribution. 11

First approach We can reformulate a partial information optimal control problem as a standard full information optimal control problem by considering the state to be the information set. The problem is that the state space dimension increases exponentially. y 1 y 2 Measurement space y 1 y 0 y 0 y 0 Information set u 1 uncertainty u 0 I 0 = ( y 0 ) I 1 = ( y 0 , y 1 , u 0 ) I 2 = ( y 0 , y 1 , y 2 , u 0 , u 1 ) Stage 2 Stage 0 Stage 1 12

Second approach It is possible to show that the knowledge of the probability distribution of the state given the information obtained so far denoted by is sufficient to determine optimal decisions P x k | I k u k = µ k ( P x k | I k ) Probability space P x 0 | I 0 P x 2 | I 2 P x 1 | I 1 uncertainty u 0 Stage 2 Stage 0 Stage 1 The space space dimension is now fixed. However, it is typically not easy to store . P x k | I k 13

Problem formulation zero mean E [ ω k ] = 0 and independent Dynamic model x k +1 = Ax k + Bu k + ω k k ∈ { 0 , . . . , h − 1 }  Q �  x k � S P h − 1 Cost function k =0 [ x | k u | + x | k ] h Q h x h S | R u k noise: zero mean and Output equation y k = Cx k + n k independent Information set I k = ( y 0 , y 1 , . . . , y k , u 0 , u 1 , . . . , u k − 1 ) Find a policy which minimizes π = { µ 0 , . . . , µ h − 1 } u k = µ k ( I k )  Q �  x k � S P h − 1 k =0 E [[ x | k u | + x | k ] h Q h x h ] S | R u k 14

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Stochastic dynamic programming and linear quadratic control Output feedback linear quadratic control Separation principle Kalman filter LQG

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Static

Hardware and Device Drivers Bjrn Dbel Dresden, 2007-11-13 Outline What's so different

Using Multibeam Echo Sounders Christian Zwanzig Wrtsil ELAC Nautik, Kiel (Germany) Email:

Adventures of our BN hero Compact representation for 1. Nave Bayes probability

The Importance of M&S in Operational Testing and the Need for Rigorous Validation Kelly

Creating a Data Science Centric Organization Challenges and Opportunities Canadian Data

CS 225 Data Structures Fe February 3 - Li Lifecycle G G Carl Evans Cop Copy Con Constru

Counting the spanning trees of the 3-cube using edge slides Christopher Tuffley Institute of

FINDING, ASSESSING, AND INTEGRATING STATISTICAL SOURCES FOR DATA MINING Karin Becker 1 , Xiaojie

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Stochastic dynamic programming and linear quadratic control Output feedback linear quadratic control Separation principle Kalman filter LQG

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Static

Hardware and Device Drivers Bjrn Dbel Dresden, 2007-11-13 Outline What's so different

Using Multibeam Echo Sounders Christian Zwanzig Wrtsil ELAC Nautik, Kiel (Germany) Email:

Adventures of our BN hero Compact representation for 1. Nave Bayes probability

The Importance of M&amp;S in Operational Testing and the Need for Rigorous Validation Kelly

Creating a Data Science Centric Organization Challenges and Opportunities Canadian Data

CS 225 Data Structures Fe February 3 - Li Lifecycle G G Carl Evans Cop Copy Con Constru

Counting the spanning trees of the 3-cube using edge slides Christopher Tuffley Institute of

FINDING, ASSESSING, AND INTEGRATING STATISTICAL SOURCES FOR DATA MINING Karin Becker 1 , Xiaojie

The Importance of M&S in Operational Testing and the Need for Rigorous Validation Kelly