Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

Outline • Loop transfer recovery

Recall The LQR loop for the system , x ( t ) = Ax ( t ) + Bu ( t ) u ( t ) ∈ R ˙ u ( s ) 0 K ( sI − A ) − 1 B + where is such that is the optimal policy for the problem u ( t ) = Kx ( t ) K R ∞ min u ( t ) x ( t ) | Qx ( t ) + u ( t ) | Ru ( t ) dt 0 has the following properties -Guaranteed gain margins and phase margins for any matrices and . [ 1 ( − 60 , 60) 2 , ∞ ) Q R -Guarantees on the sensitivity (<1) and complementary sensitivity (<2) for any , Q R -Root square locus: place the poles of the loop by tuning , . Q R 1

Recall By duality, the Kalman loop for the system E [ w ( t ) w ( t + τ ) | ] = W δ ( τ ) x = Ax ( t ) + w ( t ) ˙ y ( t ) = Cx ( t ) + n ( t ) E [ n ( t ) n ( t + τ ) | ] = V δ ( τ ) y ( t ) ∈ R y( s ) y( s ) ˆ + C ( sI − A ) − 1 L − where is such that is the optimal estimator ˙ x ( t ) = A ˆ ˆ x ( t ) + Bu ( t ) + L ( y ( t ) − C ˆ x ( t )) L (in the sense of slide 11, lecture 11) has the following properties: -Guaranteed gain margins and phase margins for any matrices and . [ 1 ( − 60 , 60) 2 , ∞ ) W V -Guarantees on the sensitivity (<1) and complementary sensitivity (<2) for any , . W V -Root square locus: place the poles of the loop by tuning , . W V 2

Motivation for today’s lecture Since there are guaranteed frequency domain properties for the LQR and Kalman loops (discovered by Kalman in the 60’s, it would be reasonable to search for such properties for LQG loops). However, in 1978, John Doyle wrote the following paper finding an example for some matrices , , , Q W V R 3

John Doyle’s Example  � 1 1 Special case of John Doyle’s example A = 0 1  � 0 y( s ) u ( s ) 0 B = K ( sI − ( A + BK − LC )) − 1 L C ( sI − A ) − 1 B 1 +  � ⇥ 1 1 1 ⇤ Q = W = 1 ⇥ 1 0 ⇤ R = 0 . 01 C = A = [1 1; 0 1]; Nyquist Diagram B = [0;1]; 0.4 C = [1 0]; n = size(A,1); 0.3 Q = 1*[1 1]'*[1 1]; R = 0.001; W = [1 1]'*[1 1]; V = 1; 0.2 K = lqr(A,B,Q,R); K = -K; 0.1 Imaginary Axis [~,L,Theta,~,~,~] = kalman(ss(A,[B eye(n)],C, 0 [0 zeros(1,n)]),W,V); [numLQG,denLQG] = ss2tf( A+B*K-L*C ,L,-K,0); -0.1 [numplant,denplant] = ss2tf( A , B , C ,0); -0.2 tflqg = tf(numLQG,denLQG); -0.3 tfplant = tf(numplant,denplant); nyquist(tfplant*tflqg) -0.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 Real Axis 4

Discussion • While, as John Doyle’s example shows, there are examples of LQG loops where the gain and phase margins can be arbitrary small, it is possible via loop transfer recovery to avoid these examples. • Loop transfer recovery states that, for minimum-phase systems, the LQG loop transfer function converges to the open loop transfer function of the Kalman filter loop when the LQR control penalty converges to zero (cheap control). • Since the Kalman filter loop has good frequency domain properties LQG control tuned in this way can also have good frequency domain properties (e.g. large gain and phase margins). • Minimum phase single-input single-output (SISO) systems are those whose zeros (if any) lie on the left half complex plane (stable zeros). • Non-minimum phase system have limitations in the achievable closed-loop performance, and although there are some results also for these systems, we will not address them here. • We need a first result of LQR and Kalman loops, then we state the general loop transfer result for model-based systems and then specialised it to LQG loops. • We will consider for simplicity SISO systems although the results extend to MIMO. 5

Preliminary result Consider the family of gains obtained from the optimal policy for the u ( t ) = K ρ x ( t ) K ρ problem R ∞ min u ( t ) x ( t ) | Qx ( t ) + u ( t ) | Ru ( t ) dt u ( t ) ∈ R 0 x ( t ) = Ax ( t ) + Bu ( t ) ˙ with the gains Q = C | C and . Then R = ρ ρ → 0 ( √ ρ K ρ ) = − C or lim ρ → 0 ( √ ρ K ρ ) = C lim Note that, by duality, if and then the Kalman gains satisfy W = HH | V = θ → 0 √ √ or θ → 0 ( lim θ L θ ) = − H θ → 0 ( lim θ L θ ) = H 6

Loop transfer recovery Consider a linear system x ( t ) = Ax ( t ) + Bu ( t ) ˙ y ( t ) = Cx ( t ) y ( t ) ∈ R u ( t ) ∈ R with no unstable zeros and a model based controller, parameterized by gains and L K α ˙ x ( t ) = A ¯ ¯ x ( t ) + Bu ( t ) + L ( y ( t ) − C ¯ x ( t )) u ( t ) = K α ¯ x ( t ) where: (i) are fixed gains such that is Hurwitz*. ( A − LC ) L (ii) are a family of gains such that is Hurwitz and ( A + BK α ) K α α → 0 ( √ α K α ) = − C or α → 0 ( √ α K α ) = C lim lim Then, for each , s ∈ C α → 0 C ( sI − A ) − 1 BK α ( sI − ( A + BK α − LC )) − 1 L = C ( sI − A ) − 1 L lim *a matrix is Hurwitz if all the eigenvalues have negative real part. 8

Loop transfer recovery for LQG Consider a linear system x ( t ) = Ax ( t ) + Bu ( t ) ˙ y ( t ) = Cx ( t ) y ( t ) ∈ R u ( t ) ∈ R with no unstable zeros and an LQG controller ˙ x ( t ) = A ˆ ˆ x ( t ) + Bu ( t ) + L ( y ( t ) − C ˆ x ( t )) u ( t ) = K ρ ˆ x ( t ) where K ρ = − ρ − 1 B | P ρ A | P ρ + P ρ A − P ρ B ρ − 1 B | P α + C | C = 0 A Φ + Φ A | + W − Φ C | V − 1 C Φ = 0 L = Φ C | V − 1 Then, for each , s ∈ C lim ρ → 0 C ( sI − A ) − 1 BK ρ ( sI − ( A + BK ρ − LC )) − 1 L = C ( sI − A ) − 1 L Note that this is a direct consequence of the results of slides 6 and 8 9

Interpretation of LTR for LQG • Loop transfer recovery states that, for minimum phase systems, the LQG loop transfer function K ( sI − ( A + BK − LC )) − 1 L C ( sI − A ) − 1 B converges to the open loop transfer function of the Kalman filter loop C ( sI − A ) − 1 L when the LQR control penalty converges to zero (cheap control) and . Q = C | C R = ρ → 0 y( s ) u ( s ) 0 K ( sI − ( A + BK − LC )) − 1 L C ( sI − A ) − 1 B + (as ) R = ρ → 0 y( s ) y( s ) ˆ + C ( sI − A ) − 1 L − 10

LTR/LQG design Two steps 1. Design Kalman loop such that desired frequency domain properties are obtained, obtaining gains (see lecture 11) L y( s ) y( s ) ˆ + C ( sI − A ) − 1 L − 2. Make and a very small and obtain LQR gain Q = C | C K R = ρ Then the LQG controller ˙ x ( t ) = A ˆ ˆ x ( t ) + Bu ( t ) + L ( y ( t ) − C ˆ x ( t )) u ( t ) = K ˆ x ( t ) will yield a closed loop with similar frequency domain properties to the designed Kalman loop y( s ) u ( s ) 0 K ( sI − ( A + BK − LC )) − 1 L C ( sI − A ) − 1 B + Interesting: design first the observer and then design fast controller! (contrarily to the traditional paradigm of designing first the controller and then make fast observer). 11

Example Let us use the LTR/LQG procedure to design a controller with good gain and phase margins for the following system 1 Transfer function ( s +1)( s − 2)( s +3)     − 2 5 6 1 State-space ⇥ 0 1 ⇤ A = C = 0 1 0 0 B = 0     0 1 0 0 y( s ) u ( s ) 0 1 ? ( s +1)( s − 2)( s +3) + 12

Example 1. Design Kalman loop such that desired frequency domain properties are obtained (gain,phase . After tuning these are found acceptable margins, sensitivity and complementary sensitivity)  � Nyquist Diagram 1 0 2 W = V = 0 . 1 1.5 0 1 1 0.5 Imaginary Axis 0 -0.5 -1 -1.5 -2 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 Real Axis Bode Diagram Bode Diagram Bode Diagram 20 5 0 0 10 -2 Magnitude (dB) Magnitude (dB) Magnitude (dB) -5 0 -4 -10 -10 -6 -15 Open loop Complementary Sensitivity -20 -8 -20 -30 -25 -10 -90 0 180 135 Phase (deg) Phase (deg) Phase (deg) -135 -45 90 45 -180 -90 0 13 10 -1 10 0 10 1 10 2 10 -1 10 0 10 1 10 2 10 -1 10 0 10 1 10 2 Frequency (rad/s) Frequency (rad/s) Frequency (rad/s)

Example 2. Make and a very small and obtain LQR gain ρ = 1 × 10 − 4 Q = C | C K R = ρ Nyquist Diagram 2 1.5 1 0.5 Imaginary Axis 0 -0.5 -1 -1.5 -2 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 Real Axis Blue - Kalman loop Red - LQG loop 14

Example 2. Make and a very small and obtain LQR gain ρ = 1 × 10 − 7 Q = C | C K R = ρ Nyquist Diagram 2 1.5 1 0.5 Imaginary Axis 0 -0.5 -1 -1.5 -2 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 Real Axis Blue - Kalman loop Red - LQG loop 15

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Loop transfer recovery Recall The LQR loop for the system , x ( t ) = Ax ( t ) + Bu ( t ) u ( t ) R u ( s ) 0 K (

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

reinforcement learning through the optimization lens Benjamin Recht University of California,

W h e n i s a L i n e a r S y s t e m W h e n i s a L i n e a r S

Programming-Model Centric Debugging for Multicore Embedded Systems Kevin Pouget, UJF-LIG,

What's new in GStreamer Land The last 2 years and the future FOSDEM 2017, Brussels Open Media

Step Response Analysis. Frequency Response, Relation Between Model Descriptions Automatic

Signal and Systems Chapter 6: Time-Frequency Characterization of Systems Magnitude/Phase of

Current state on filter approximation and evaluation Thibault Hilaire (thibault.hilaire@lip6.fr)

Model reduction of large-scale systems Thanos Antoulas Rice University and Jacobs University