Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation

optimal control and dynamic programming
SMART_READER_LITE
LIVE PREVIEW

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Recall Discrete Stage decision Continuous-time optimization problems control problems problems Discrete-time system & Differential equations &


slide-1
SLIDE 1

4SC000 Q2 2017-2018

Optimal Control and Dynamic Programming

Duarte Antunes

slide-2
SLIDE 2

1

Recall

Discrete

  • ptimization

problems Stage decision problems Continuous-time control problems Formulation Transition diagram Discrete-time system & additive cost function Differential equations & additive cost function DP algorithm Graphical DP algorithm & DP equation DP equation Hamilton Jacobi Bellman equation Partial information Bayesian inference & decisions based on

  • prob. distribution

Kalman filter and separation principle Continuous-time Kalman filter and separation principle Alternative algorithms Dijkstra's algorithm Static optimization Pontryagin’s maximum principle (PMP) Today: continuous-time Kalman filter and separation principle And a new topic - frequency domain properties of LQR

slide-3
SLIDE 3

Outline

  • Linear quadratic control, Kalman filter,

separation principle

  • Frequency domain properties of LQR
slide-4
SLIDE 4

However, how to define disturbances for continuous-time systems? The analogous problem to linear quadratic control for continuous-time systems would be It it quite challenging! White noise disturbances are one of the few ways to define disturbances without ‘‘memory’ for continuous-time systems. ˙ x(t) = Ax(t) + Bu(t)+ w(t)

2

Linear quadratic control

min

u(t)=µ(t,x(t)) E[

Z T x(t)|Qx(t) + u(t)|Ru(t)dt + x(T)|QT x(T)]

slide-5
SLIDE 5

3

White noise

Very interesting (or strange!) properties:

  • it is continuous but not differentiable anywhere
  • the integral over a finite interval is infinite
  • does not exist in nature
  • Even for a small time interval , and are uncorrelated, that

is the autocorrelation is zero

  • When , (infinite power)
  • The auto-correlation is then a Dirac delta function

and the scalar process white noise is characterized by the amplitude

ω Let us start with a scalar white noise process ω(t) ∈ R time

ω(t) ω(t + δ)

δ

δ = 0

E[ω(t)ω(t)] = E[ω(t)2] = ∞ R(τ) = E[ω(t)ω(t + τ)] = 0

R(τ) = aδ(t)

slide-6
SLIDE 6

4

˙ x(t) = w(t) The integral of white noise is called random walk or the Wiener process

  • We shall assume that is Gaussian for each fixed time and this implies

that is also Gaussian for fixed time. and it is more intuitive and it is easier to handle mathematically 1 s w(t) x(t) x(t)

  • has now finite power

x(t) w(t)

  • x(t + τ)

x(t) and are correlated

Random walk

slide-7
SLIDE 7

5

Discussion

˙ x(t) = Ax(t) + Bu(t)+

  • In a similar way to the Wiener process the solution to this stochastic

differential equation is more intuitive than white noise.

  • If , and we assume that

x(t) ∈ Rn w(t) ∈ Rn w(t) = N ¯ w(t) ¯ w(t) = ⇥ ¯ w1(t) ¯ w2(t) . . . ¯ wp(t)⇤| where are Gaussian white noise scalar variables and uncorrelated Thus, , ¯ wi(t) w(t) E[ ¯ w(t) ¯ w(t + τ)|] = Iδ(τ) E[w(t)w(t + τ)|] = NN |δ(τ) := Wδ(τ)

slide-8
SLIDE 8

6

Discussion

  • It is possible to prove that the discretized system takes the form
  • Since the optimal control policy for such system would be the same linear

state feedback control law as for the deterministic version of the problem, the next results come with no surprise. xk+1 = Adxk + Bduk + wk xk := x(tk) tk = kτ u(t) = uk, t ∈ [tk, tk+1) where as before , and are zero-mean Gaussian random independent variables with covariance wk Ad = eAτ

Bd = Z τ eAsBds

E[wkw|

k] =

Z τ eAsWeA|sds

  • The cost can also be written in terms of the discrete-time variables and it

is also a quadratic function.

slide-9
SLIDE 9

7

Finite horizon linear quadratic control

The optimal control law for the problem ˙ x(t) = Ax(t) + Bu(t)+ is , where u(t) = K(t)x(t) K(t) = −R−1B|P(t)x(t) ˙ P(t) = −(A|P(t) + P(t)A − P(t)BR−1B|P(t) + Q) P(T) = QT t ∈ [0, T) w(t) Q > 0 R > 0 min

u(t)=µ(t,x(t)) E[

Z T x(t)|Qx(t) + u(t)|Ru(t)dt + x(T)|QT x(T)] where is zero-mean Gaussian white noise with , w(t) E[w(t)w(t + τ)|] = Wδ(τ)

slide-10
SLIDE 10

8

Infinite horizon linear quadratic control

The optimal control law for the problem ˙ x(t) = Ax(t) + Bu(t)+ is , where is the unique positive definite solution to the (continuous-time) algebraic Riccati equation w(t) min

u(t)=µ(x(t)) lim T →∞

1 T E[ Z T x(t)|Qx(t) + u(t)|Ru(t)dt] Q > 0 R > 0 (A, B) controllable u(t) = Kx(t) K = −R−1B|P P where is zero-mean Gaussian white noise with , w(t) E[w(t)w(t + τ)|] = Wδ(τ) A|P + PA − PBR−1B|P + Q = 0

slide-11
SLIDE 11

9

Output feedback linear quadratic control

Problem formulation ˙ x(t) = Ax(t) + Bu(t)+ w(t) y(t) = Cx(t) + n(t)

  • zero-mean Gaussian white noise with

w(t)

  • zero-mean Gaussian white noise with

n(t)

  • information set
  • Gaussian random vector with mean and covariance

x(0) E[w(t)w(t + τ)|] = Wδ(τ) E[n(t)n(t + τ)|] = V δ(τ) ¯ x0 ¯ Φ0 I(t) = {y(s), u(s)|s ∈ [0, t)} min

u(t)=µ(t,I(t)) E[

Z T x(t)|Qx(t) + u(t)|Ru(t)dt + x(T)|QT x(T)]

slide-12
SLIDE 12

10

Discussion

  • The solution to this problem is analogous to the discrete-time case.
  • In particular there is also a separation principle: the optimal controller

consists of an optimal estimator (Kalman filter) + optimal controller (LQR)

  • The derivation of the Kalman filter (in continuous-time is known as the

Kalman-Bucy filter) and of this result is mathematically quite involved.

  • We simply state the results next, without further justification.
slide-13
SLIDE 13

11

Kalman-Bucy filter

where , are zero-mean Gaussian white noise and the initial state is zero-mean Gaussian random variable. w(t) ˙ x(t) = Ax(t) + Bu(t)+ w(t) The optimal estimator in the sense that minimizes Consider the problem of finding an estimator for the state of ˆ x for any constant vector is the Kalman-Bucy filter c as a function of the information set which includes the measurements y(t) = Cx(t) + n(t) n(t) ˙ ˆ x(t) = Aˆ x(t) + Bu(t) + L(t)(y(t) − Cˆ x(t)) L(t) = Φ(t)C|V −1 t ≥ 0 ˙ Φ(t) = AΦ(t) + Φ(t)A| + W − Φ(t)C|V −1CΦ(t) Φ(0) = E[(x(0) − ¯ x0)(x(0) − ¯ x0)|] = ¯ Φ0 E[w(t)w(t + τ)|] = Wδ(τ) E[n(t)n(t + τ)|] = V δ(τ) ˆ x(0) = ¯ x0 c|E[(ˆ x(t) − x(t))(ˆ x(t) − x(t))||I(t)]c

slide-14
SLIDE 14

12

LQG - Separation principle

˙ ˆ x(t) = Aˆ x(t) + Bu(t) + L(t)(y(t) − Cˆ x(t)) L(t) = Φ(t)C|V −1 ˙ P(t) = −(A|P(t) + P(t)A − P(t)BR−1B|P(t) + Q) P(T) = QT t ∈ [0, T) t ∈ [0, T) The optimal control input for the output feedback linear quadratic optimal control problem is u(t) = K(t)ˆ x(t) where K(t) = −R−1B|P(t) ˙ Φ(t) = AΦ(t) + Φ(t)A| + W − Φ(t)C|V −1CΦ(t) Φ(0) = E[(x(0) − ¯ x0)(x(0) − ¯ x0)|] = ¯ Φ0

slide-15
SLIDE 15

13

LQG - Separation principle

The optimal control input for the output feedback linear quadratic optimal control problem with cost (1) is where u(t) = Kˆ x(t) ˙ ˆ x(t) = Aˆ x(t) + Bu(t) + L(y(t) − Cˆ x(t)) K = −R−1B|P A|P + PA − PBR−1B|P + Q = 0 AΦ + ΦA| + W − ΦC|V −1CΦ = 0 L = ΦC|V −1 If instead of the finite-horizon cost, we consider (1) min

u(t)=µ(t,I(t)) lim T →0

1 T E[ Z T x(t)|Qx(t) + u(t)|Ru(t)dt]

slide-16
SLIDE 16

14

Inverted pendulum example

For the model provided in Lecture II_1, slide 32, (state-feedback, for simplicity) let us compare discrete-time and continuous-time gains

clear all, close all, clc % definition of the continuous-time model m = 0.2; M = 1; b = 0.05; I = 0.01; g = 9.8; l = 0.5; p = (I+m*l^2)*(M+m)-m^2*l^2; Ac = [0 1 0 0; 0 -(I+m*l^2)*b/p (m^2*g*l^2)/p 0; 0 0 0 1; 0 -(m*l*b)/p m*g*l*(M+m)/p 0]; Bc = [ 0; (I+m*l^2)/p; 0; m*l/p]; Q = diag([1 1 1 1]); S = zeros(4,1); R = 1; % discretization n = 4; tau = 0.01; sysd = c2d(ss(Ac,Bc,zeros(1,n),0),tau); A = sysd.a; B = sysd.b; % LQR control discrete time K = dlqr(A,B,Q,R,S); K = -K; % continuous-time Kc = lqr(Ac,Bc,Q,R,S); Kc =-Kc;

slide-17
SLIDE 17

15

Inverted pendulum example

τ = 0.1 τ = 0.01 τ = 0.001 K = ⇥0.5955 1.4650 −25.3322 −5.9529⇤ Kc = ⇥1.0000 2.3674 −33.1623 −7.8509⇤ K = ⇥0.9495 2.2551 −32.1930 −7.6156⇤ K = ⇥0.9948 2.3559 −33.0632 −7.8269⇤ Continuous-time gains ( policy ) Discrete-time gains ( policy ) u(t) = Kcx(t) uk = Kxk (converging to continuous-time gains as expected)

slide-18
SLIDE 18

16

Discussion

  • We saw that the discrete-time LQR controller gains converge to the

continuous-time ones.

  • We could also have concluded that the discrete-time LQG controller

(output feedback) converges to the continuous-time one.

  • Therefore, from a practical point of view we just need to know the

discrete-time version.

  • However, the continuous-time formulation is more elegant and actually

much more used and known.

  • In fact, working in continuous-time makes everything independent of the

sampling/discretization period and allows to gain insight about the system.

  • One good example is frequency-domain analysis which we address next:

it is much easier to work with bode plots or root plots in continuous-time than in discrete-time.

slide-19
SLIDE 19

Outline

  • Linear quadratic control, Kalman filter,

separation principle

  • Frequency domain properties of LQR
slide-20
SLIDE 20

17

Discussion

  • While our optimal control formulation was in the time domain, linear quadratic control

has very interesting properties in the frequency domain (impressive theory!)

  • First there are a set of results pertaining to the LQR (state feedback)
  • By duality these results are available also for the Kalman filter loop.
  • All these results follow from the Frequency Domain Equality (FDE).
  • Then there is an one of the most interesting formal results for LQG control which is the

loop transfer recovery (LTR).

  • In order to understand these results we need to write the LQR, Kalman and LQG loops

in the Laplace domain.

  • For simplicity throughout our discussion (the remaining of this lecture) we consider SISO

systems.

  • Guaranteed large gain and phase margins for any matrices and .
  • Guarantees on the sensitivity and complementary sensitivity for any , .
  • Root square locus: place the poles of the loop by tuning , .

R Q

R Q R Q

slide-21
SLIDE 21

18

Laplace domain, LQR

+ K(sI − A)−1B ˙ x(t) = Ax(t) + Bu(t) u(t) = Kx(t)

Time-domain Laplace-domain

u(s) = Z ∞ u(t)e−stdt x(s) = Z ∞ x(t)e−stdt u(s) = Kx(s) x(s) = (sI − A)−1Bu(s) u(s)

slide-22
SLIDE 22

19

Laplace domain, Kalman filter

C(sI − A)−1L + − y(s) = Z ∞ y(t)e−stdt ˆ y(s) y(s) ˙ ˆ x(t) = Aˆ x(t) + L(y(t) − ˆ y(t)) ˆ y(t) = Cˆ x(t)

Time-domain Laplace-domain

ˆ y(s) = Cx(s) ˆ x(s) = (sI − A)−1L(y(s) − ˆ y(s)) ˆ x(s) = Z ∞ ˆ x(t)e−stdt ˆ y(s) = Cˆ x(s)

slide-23
SLIDE 23

20

Laplace domain, LQG

Process LQG controller u(t) = Kˆ x(t) ˙ ˆ x(t) = (A + BK − LC)ˆ x(t) + Ly(t)

K(sI − (A + BK − LC))−1L C(sI − A)−1B

time domain Laplace domain

+

u(s) y(s) = K(sI − (A + BK − LC))−1L

u(s) y(s)

slide-24
SLIDE 24

21

Discussion

  • Loop transfer recovery states that, for certain systems, the LQG loop transfer function

converges to the open loop transfer function of the Kalman filter loop when the LQR control penalty converges to zero (cheap control) , and . K(sI − (A + BK − LC))−1L C(sI − A)−1B C(sI − A)−1L R = ρ → 0 K(sI − (A + BK − LC))−1L C(sI − A)−1B + u(s) y(s) C(sI − A)−1L + − ˆ y(s) y(s) (as ) R = ρ → 0

  • Then it suffices to tune the Kalman loop and obtain the cheap control solution.
  • After reviewing some basic notions, we introduce the frequency domain properties of

the LQR loop, the dual properties of the Kalman loop and next lecture we address LTR. Q = C|C

slide-25
SLIDE 25

22

Transfer functions

+ − K(s) P(s) Process Controller y r u e + + n d disturbances

  • utput

noise reference error control Two of the most basic analysis tools are: Nyquist diagram and root-locus analysis - both allow to infer stability of the closed loop from the open loop t.f K(s) = nK(s)

dK(s)

K(s)P(s) = nK(s)

dK(s) nP (s) dP (s) := n(s) d(s)

P(s) = nP (s)

dP (s)

C(s) =

K(s)P (s) 1+K(s)P (s)

(open loop t.f.) (closed loop t.f.) S(s) =

1 1+K(s)P (s)

(sensitivity t.f.)

slide-26
SLIDE 26

23

Plot the open loop transfer function (bode plot) and the Nyquist plot

Nyquist plot and stability criterion

Nyquist stability criterion dB

L1 L2 L1

if there are no unstable open loop poles, stability Nyquist curve does not encircle -1 ≡

N = Z − P

(clockwise) encirclements of -1 by the nyquist plot zeros of in the right half plane 1 + K(s)P(s) (unstable closed loop poles) poles of in the right half plane (unstable open loop poles) 1 + K(s)P(s) −1

L2

Re{K(s)P(s)}

Im{K(s)P(s)}

−180

|K(jω)P(jω)| arg(K(jω)P(jω))

ω ω

slide-27
SLIDE 27

24

Let be the interval of positive gains that can one can multiply the open-loop t.f. without destabilizing the closed loop. Then, the downward margin is and the upward gain margin is .

Gain margin

dB

L1 L2

Can be computed from the bode plot! I G GM −180 GM − = min{G|G ∈ I} GM + = max{G|G ∈ I}

|K(jω)P(jω)| arg(K(jω)P(jω))

log(ω) dB log(ω)

slide-28
SLIDE 28

25

Let be the interval of phase such that multiplying the open-loop freq. response by does not destabilize the closed-loop. The negative phase margin is

Phase margin

dB

L2

Can be computed from the bode plot! D −180 d K(jω)P(jω)

|K(jω)P(jω)| arg(K(jω)P(jω))

ejd PM − = min{d|d ∈ D} PM + = max{d|d ∈ D} the positive phase margin is . log(ω) log(ω) dB

slide-29
SLIDE 29

Outline

  • Linear quadratic control, Kalman filter,

separation principle

  • Frequency domain properties of LQR
  • Gain and phase margins
  • Sensitivity and complementary sensitivity
  • Root square locus
  • Duality - properties of Kalman filter
slide-30
SLIDE 30

26

Consider the LQR gains which result from the optimal policy for the problem

Gain and phase margins of LQR

and consider the LQR closed-loop The gain margins and phase margins for this closed-loop system are at least as follows ˙ x = Ax + Bu u = Kx

+ − + −

A R B −K

+

n(s) d(s)

min R ∞ x|Qx + u|Rudt D = (−60, 60) I = (1/2, ∞) PM − = −60 PM + = 60 GM − = 1/2 GM + = ∞ n(s) d(s) = −K(sI − A)−1B These margins hold of any system. and any (!) and from frequency domain equality

(A, B) Q, R

slide-31
SLIDE 31

27

Frequency domain equality (FDE)

K Consider the LQR gains resulting from a standard linear quadratic control problem K = −R−1B|P Then, defining the following holds Φ(s) = (sI − A)−1

  • r in the special case where
  • pen loop transfer function

m = 1 Q = M |M 0 = A|P + PA + M |M − PBR−1B|P [I − KΦ(−s)B)]|R[I − KΦ(s)B)] = R + [MΦ(−s)B]|[MΦ(s)B]

|{z}

(1 − KΦ(−s)B))(1 − KΦ(s)B)) = 1 + 1 R(MΦ(−s)B)(MΦ(s)B)

slide-32
SLIDE 32

Proof of the FDE

0 = −A|P − PA − M |M + PBR−1B|P | {z }

K|RK

Q = M |M K = −R−1B|P Start with the continuous-time algebraic Riccati equation and add and subtract sP 0 = (−sI − A)|P + P(sI − A) − M |M + K|RK Then premultiply by and postmultiply by Φ(s)B B|Φ(−s)| Φ(s) = (sI − A)−1 Rearrange and add to both sides of the equation to arrive at FDE R 0 = B|P |{z}

=−RK

Φ(s)B+B|Φ(−s)|PB |{z}

=−K|R

−B|Φ(−s)|M |MΦ(s)B+B|Φ(−s)|K|RKΦ(s)B

28

R+B|Φ(−s)|M |MΦ(s)B = R − RKΦ(s)B − B|Φ(−s)|K|R + B|Φ(−s)|K|RKΦ(s)B | {z }

=[I−KΦ(−s)B]|R[I−KΦ(s)B]

slide-33
SLIDE 33

29

FDE & Gain/Phase margins

Making we have s = jω

  • r equivalently

from which we conclude that The Nyquist plot (curve ) is always outside the region −1 Geometrically we can then infer the phase and gain margins given before −2 {φ ∈ C| |1 + φ| < 1} |1 − KΦ(jω)B)|2 = 1 + 1 R|MΦ(jω)B|2 |1 − KΦ(jω)B)| ≥ 1 φ(ω) = −KΦ(jω)B Im − KΦ(jω)B Re − KΦ(jω)B (1 − KΦ(−jω)B))(1 − KΦ(jω)B)) = 1 + 1 R(MΦ(−jω)B)(MΦ(jω)B)

slide-34
SLIDE 34

Outline

  • Linear quadratic control, Kalman filter,

separation principle

  • Frequency domain properties of LQR
  • Gain and phase margins
  • Sensitivity and complementary sensitivity
  • Root square locus
  • Duality - properties of Kalman filter
slide-35
SLIDE 35

30

Sensitivity and complementary sensitivity

From the FDE K(sI − A)−1B u(s) |1 − KΦ(jω)B)|2 = 1 + 1 R|MΦ(jω)B|2 |1 − KΦ(jω)B)| ≥ 1 we conclude that the the sensitivity function of the LQR loop satisfies = ⇒ |

1 1−KΦ(jω)B | ≤ 1

and from the identity we conclude that the complimentary sensitivity (closed-loop transfer function) satisfies − − +

1 1−KΦ(jω)B + −KΦ(jω)B 1−KΦ(jω)B = 1

| −KΦ(jω)B

1−KΦ(jω)B | ≤ 2

log(ω) dB LQR sensitivity bode plot LQR complementary sensitivity bode plot log(ω) dB 6 Φ(s) = (sI − A)−1

slide-36
SLIDE 36

31

Cheap control approximation and roll-off

we conclude that if is very small (cheap control), then (since is also very large) Conclusion: cheap control LQR loops have -20 db/dec frequency roll-off From the FDE |1 − KΦ(jω)B)|2 = 1 + 1 R|MΦ(jω)B|2 Φ(s) = (sI − A)−1 R = ρ For LQR with cheap control, we then conclude and this implies that Moreover, note that at high frequencies K |KΦ(jω)B| ≈

1 √ρ|MΦ(jω)B|

Φ(jω) ≈

1 jωI MΦ(jω)B ≈ 1 jωMB

log(ω) dB 6

  • 20 bd/dec

|KΦ(jω)B| ≈

1 jω|MB|

| −KΦ(jω)B

1−KΦ(jω)B | ≈ 1 jω|MB|

ω → ∞ ω → ∞

slide-37
SLIDE 37

32

Example

Suppose that the process is the state-space representation of P(s) =

1 s+1 1 s+2 1 s+3

  • btained with Matlab function tf2ss.m and consider , .

Q = I

tf1 = tf([1],[1 1])*tf([1],[1 2])*tf([1],[1 3]); [A,B,C,D] = tf2ss(tf1.num{1},tf1.den{1}); Q = C'*C; R = 0.0001; W = [0:0.01:1e3]; K = lqr(A,B,Q,R); K = -K; [num,den] = ss2tf(A,B,-K,0); tflqr = tf(num,den); figure(1), bode(tflqr), grid on figure(2), nyquist(tflqr), grid on figure(3), bode(tflqr/(1+tflqr),W), grid on figure(4), bode(1/(1+tflqr)), grid on

  • 30
  • 20
  • 10

10 20 30

Magnitude (dB)

10-2 10-1 100 101 102

  • 135
  • 90
  • 45

Phase (deg) Bode Diagram Frequency (rad/s)

loop

R = 0.0001

  • 20 db roll-off
slide-38
SLIDE 38

33

Example: Nyquist plot

  • 5

5 10 15 20

  • 10
  • 8
  • 6
  • 4
  • 2

2 4 6 8 10

  • 6 dB

0 dB

  • 4 dB
  • 2 dB

6 dB 4 dB 2 dB

Nyquist Diagram Real Axis Imaginary Axis

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

  • 20 dB

0 dB

  • 10 dB
  • 6 dB
  • 4 dB
  • 2 dB

20 dB 10 dB 6 dB 4 dB 2 dB

Nyquist Diagram Real Axis Imaginary Axis

(zoom in)

Nyquist plot does not enter unitary circle around (-1,0)

slide-39
SLIDE 39

34

Example: sensitivy and comp. sensitivity

  • 50
  • 40
  • 30
  • 20
  • 10

10

Magnitude (dB)

10-2 10-1 100 101 102 103

  • 90
  • 45

Phase (deg) Bode Diagram Frequency (rad/s)

  • 25
  • 20
  • 15
  • 10
  • 5

Magnitude (dB)

10-2 10-1 100 101 102 45 90

Phase (deg) Bode Diagram Frequency (rad/s)

Complementary sensitivity Sensitivity

Absolute value below 2 (6 db)

  • 20 bd roll-off

Absolute value below 1 (0 db)

slide-40
SLIDE 40

Outline

  • Linear quadratic control, Kalman filter,

separation principle

  • Frequency domain properties of LQR
  • Gain and phase margins
  • Sensitivity and complementary sensitivity
  • Root square locus
  • Duality - properties of Kalman filter
slide-41
SLIDE 41

35

Root-locus

Another interesting fact that we can conclude from the FDE is the geometric place of the closed-loop poles! Let us start by reviewing the root-locus

+ −

n(s) d(s) G Root-locus: procedure to determine where the closed-loop poles (solution to are as a function of the gain G

  • 1. For the closed loop poles coincide with the open loop poles
  • 2. For the closed loop poles coincide with the open loop zeros and if the number of zeros

is less than the number poles, the remaining poles go to infinity in a Butterworth pattern 3.... (about 7 rules, see textbooks such as Feedback control of dynamical systems, G. Franklin et al, we will only need these first two) G = 0 G → ∞ ) d(s) + Gn(s) = 0

slide-42
SLIDE 42

36

Example

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

2 4

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6 0.8 1

Root Locus Real Axis (seconds-1) Imaginary Axis (seconds-1)

tf1 = tf([1 -2],[1 1 1])*tf([1 4],[1 5]) rlocus(tf1)

n(s) d(s) = s−2 s2+s+1 s+4 s+5

slide-43
SLIDE 43

37

Root square locus for LQR

From the FDE we conclude that we can obtain the poles as a function of by a procedure identical to the root locus: we just have to mirror (with respect to the imaginary axis) the zeros and poles of and obtain the stable closed-loop poles from the diagram (the root locus will always be symmetric with respect the imaginary axis) R MΦ(s)B Example of a root square locus p(s)

  • 10
  • 8
  • 6
  • 4
  • 2

2 4 6 8

  • 10
  • 8
  • 6
  • 4
  • 2

2 4 6 8

Root Locus Real Axis (seconds-1) Imaginary Axis (seconds-1)

poles of p(s) p(−s) poles of zeros of p(s) zeros of p(−s) (1 − KΦ(−s)B))(1 − KΦ(s)B)) = 1 + 1 R(MΦ(−s)B)(MΦ(s)B)

slide-44
SLIDE 44

38

Example

  • 15
  • 10
  • 5

5 10 15

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

Root Locus Real Axis (seconds-1) Imaginary Axis (seconds-1)

K = lqr(A,B,M'*M,1/10); K = -K; eig(A+B*K)

A =   1 1 10 13 2   B =   1   M = ⇥10 −6 1⇤ 3 ± 1i 5, −2, −1 Open-loop zeros Open-loop poles Analyzing the root square locus we can check the closed-loop poles for instance when the “open- lop gain” equals 1 ρ = 10 and confirm the location

  • f the poles with these

matlab instructions

slide-45
SLIDE 45

39

Pole placement for LQR loop

From the root-square locus we conclude that (with cheap control) we can assign the location of closed-loop poles and the remaining will be stable and very fast Given

  • 1. Define a vector such that

˙ x(t) = Ax(t) + Bu(t) m < n

  • A. Plant model
  • B. Desired location of closed-loop poles , .
  • 2. Define a vector such that

pi i ∈ {1, 2, . . . , m} Procedure det(sI − A) = sn + αn−1sn−1 + · · · + α1s + α0 (s − p1)(s − p2) . . . (s − pm) = sm + γm−1sm−1 + · · · + γ1s + γ0 and write the model in companion form α = ⇥α0 α1 . . . αn−1 ⇤

¯ B =      . . . b      ¯ A =       1 . . . 1 . . . . . . . . . . . . . . . . . . . . . 1 −α0 −α1 −α2 . . . −αn−1      

˙ ¯ x(t) = ¯ A¯ x(t) + ¯ B¯ u(t)

  • 3. Solve the LQR problem

˙ ¯ x(t) = ¯ A¯ x(t) + ¯ B¯ u(t) for obtaining law R ∞ ¯ x(t)| ¯ Q¯ x(t) + ¯ u(t)| ¯ R¯ u(t)dt ¯ Q = γ|γ ¯ R → 0

  • 4. The law , will yield the desired closed loop poles of .

(A + BK) m u = Kx ¯ u = ¯ K¯ x K = ¯ KT ¯ x = Tx

T = ⇥ ¯ B ¯ A ¯ B . . . ¯ An−1 ¯ B⇤ ⇥B AB . . . An−1B⇤−1

γ = ⇥γ0 γ1 . . . γm−1 1 . . . 0⇤ ∈ Rn

slide-46
SLIDE 46

Outline

  • Linear quadratic control, Kalman filter,

separation principle

  • Frequency domain properties of LQR
  • Gain and phase margins
  • Sensitivity and complementary sensitivity
  • Root square locus
  • Duality - properties of Kalman filter
slide-47
SLIDE 47

40

Dual problem

Designing the Kalman filter loop (obtain the gain L ) C(sI − A)−1L + − ˆ y(s) y(s) is the same problem as designing the LQR loop (obtain the gain K) if we make the transformation

AΦ + ΦA| + W − ΦC|V −1CΦ = 0

L = ΦC|V −1

¯ K = −L| ¯ A = A| ¯ B = C| ¯ R = V ¯ Q = W ¯ K(sI − ¯ A)−1 ¯ B ≡ ¯ B|(sI − ¯ A|)−1 ¯ K| ¯ K| = − ¯ R−1 ¯ B| ¯ P ¯ A| ¯ P + ¯ P ¯ A − ¯ P ¯ B ¯ R−1 ¯ B| ¯ P + ¯ Q = 0 ¯ P = Φ

slide-48
SLIDE 48

41

FDE of Kalman filter loop

FDE when From this FDE it follows:

  • analogous gain and phase margins for the Kalman filter loop.
  • analogous bounds for the sensitivity and complementary sensitivity

for the Kalman filter loop.

  • Analogous root-square locus for pole placement.

Φ(s) = (sI − A)−1 (1 + CΦ(−s)L)(1 + CΦ(s)L) = 1 + 1

V (CΦ(−s)F)(CΦ(s)F)

W = FF | where m = 1

slide-49
SLIDE 49

42

Concluding remarks

After this lecture, you should be able to: To summarize:

  • It is not easy to define white noise disturbances in continuous-time but it is easy for

an arbitrarily close discrete-time approximation

  • Following a temporal discretisation approach we obtain continuous-time analogous

results of the Kalman filter, LQR (finite and infinite horizon), and LQG control

  • LQR loops have very interesting frequency domain properties, most of them following

from the frequency domain equatlity.

  • Duality allows to translate this properties to the Kalman filter loop
  • Synthesise Kalman filters, LQR and LQG controllers in discrete-time
  • Choose the gain matrices of the LQR, Kalman, LQG framework to assign poles of the

closed-loop.

slide-50
SLIDE 50

Appendix A

White noise, continuous-time

slide-51
SLIDE 51

Stochastic process

A1

Ω Function of and time Probability space ω ω ∈ Ω t x(ω, t)

  • For fixed , is a random variable
  • For each , is a function of time
  • Dependence on typically omitted

ω Stochastic process ≡ x(ω, t) t ω x(ω, t) t t

slide-52
SLIDE 52

Auto-correlation and power spectral density

A2

Let us assume that and . Then the auto- correlation function of the process is defined by t E[x(t)] = 0 x(t) ∈ R R(r, s) := E[x(r)x(s)] s r For signals of interest (stationary) it is only a function of R(τ) := E[x(r)x(r + τ)] τ = s − r The power spectral density is defined as the Fourier transform of R(τ) S(ω) = Z ∞

−∞

R(τ)e−ωτdτ R(τ) = 1 2π Z ∞

−∞

S(ω)eωτdω

slide-53
SLIDE 53

A3

t s r

Example

Auto-correlation Power spectral density Sample paths t t R(τ) = a

2e−a|τ|

S(ω) =

a2 w2+a2

s r s r

a 2

τ ω a −a a = 0.01 a = 1 a = 100 1/a −1/a 1

slide-54
SLIDE 54

A4

Power

For signals of interest (ergodic) the power is equal to limT →∞

1 2T

R T

−T x(t)2dt

For a given sample path R(0) = E[x(t)2]

  • r (using the inverse Fourier transform)

R(0) = 1 2π Z ∞

−∞

S(ω)dω

slide-55
SLIDE 55

A5

White noise

Power spectral density is one for every frequency ( in previous example) Very interesting (or strange!) properties:

  • infinite variance, infinite power
  • it is continuous but not differentiable anywhere
  • the integral of a sample path over a finite interval is infinite
  • does not exist in nature

a → ∞ τ ω 1 R(τ) = δ(τ) S(ω) = 1 R(0) = E[x(t)2]

slide-56
SLIDE 56

A6

˙ x(t) = w(t) The integral of white noise is called random walk or the Wiener process

  • We shall assume that is Gaussian for each fixed time and this implies

that is also Gaussian for fixed time. and it is more intuitive and it is easier to handle mathematically 1 s w(t) x(t) x(t)

  • has now finite power

x(t) w(t)

  • x(t + τ)

x(t) and are correlated

Random walk

slide-57
SLIDE 57

For a differential equation (for now assume ) Two options for the auto-correlation ˙ x(t) = Ax(t) + Bu(t)+

A7

Noise model for continuous-time systems

we would like (uncorrelated noise). Otherwise the state would not summarize all the information to make decisions* x(t) ∈ R E[w(r)w(s)] = 0

*e.g. if then if is positive and large than it is likely that is positive and large

w(r + τ)

E[w(r)w(r + τ)] > 0 w(r)

τ R(τ) = δ(τ) τ R(0) = constant

Power spectral density is zero, no power, zero signal - Not interesting! white noise!

w(t)

slide-58
SLIDE 58

A8

Discussion

˙ x(t) = Ax(t) + Bu(t)+

  • In a similar way to the Wiener process the solution to this stochastic

differential equation is more intuitive than white noise (e.g. has finite power).

  • If , and we assume that

x(t) ∈ Rn w(t) ∈ Rn w(t) = N ¯ w(t) ¯ w(t) = ⇥ ¯ w1(t) ¯ w2(t) . . . ¯ wp(t)⇤| where are Gaussian white noise scalar variables and uncorrelated Thus, , ¯ wi(t) w(t) E[ ¯ w(t) ¯ w(t + τ)|] = Iδ(τ) E[w(t)w(t + τ)|] = NN |δ(τ) := Wδ(τ)