LQ optimal control for partially specified input noise Alexander - - PowerPoint PPT Presentation

lq optimal control for partially specified input noise
SMART_READER_LITE
LIVE PREVIEW

LQ optimal control for partially specified input noise Alexander - - PowerPoint PPT Presentation

LQ optimal control for partially specified input noise Alexander Erreygers Jasper De Bock Gert de Cooman Arthur Van Camp Ghent University 28th European Conference on Operational Research 1 / 13 Scalar linear systems The controller is


slide-1
SLIDE 1

LQ optimal control for partially specified input noise

Alexander Erreygers Jasper De Bock Gert de Cooman Arthur Van Camp

Ghent University

28th European Conference on Operational Research

1 / 13

slide-2
SLIDE 2

Scalar linear systems

The controller is interested in the system

Xk+1 = aXk + buk + Wk,

(1) for k ∈ N = {0, 1, . . . , n}, where n ∈ N, a ∈ R and b ∈ R \ {0}, where

Xk+1 is the real-valued state, uk is the real-valued control input, Wk is the real-valued stochastic noise.

In general, system parameters a and b can be time dependent.

2 / 13

slide-3
SLIDE 3

Scalar linear systems

The controller is interested in the system

Xk+1 = aXk + buk + Wk.

(1)

Observation assumptions

1 Before applying uk, the controller observes the actual value

xk of Xk (hence X0 ≡ x0).

2 The controller has perfect recall.

2 / 13

slide-4
SLIDE 4

Scalar linear systems

The controller is interested in the system

Xk+1 = aXk + bφk(Xk) + Wk.

(1)

Observation assumptions

1 Before applying uk, the controller observes the actual value

xk of Xk (hence X0 ≡ x0).

2 The controller has perfect recall.

Controller determines uk from state history xk := (x0, . . . , xk):

uk = φk(xk). φk : Rk+1 → R is a feedback function, φ := (φ0, . . . , φn) is a control policy, Φ denotes the set of all control policies.

2 / 13

slide-5
SLIDE 5

Scalar linear systems

The controller is interested in the system

Xk+1 = aXk + bφk(Xk) + Wk.

(1)

Observation assumptions

1 Before applying uk, the controller observes the actual value

xk of Xk (hence X0 ≡ x0).

2 The controller has perfect recall.

Controller knows xk and φ → can calculate wk−1.

2 / 13

slide-6
SLIDE 6

Optimality of a control policy

For any control policy φ ∈ Φ, any k ∈ N and any state history

xk ∈ Rk+1 we define the quadratic cost functional as J[φ|xk] :=

n

  • ℓ=k

rφℓ(xk, Xk+1:ℓ)2 + qX2

ℓ+1,

where q ≥ 0 and r > 0 are real-valued coefficients.

3 / 13

slide-7
SLIDE 7

Precise noise model

Definition (Precise noise model or PNM)

The controller’s beliefs about the noise W0, . . . , Wn are modelled using a linear expectation operator E.

4 / 13

slide-8
SLIDE 8

Optimality of a control policy

For any control policy φ ∈ Φ, any k ∈ N and any state history

xk ∈ Rk+1 we define the quadratic cost functional as J[φ|xk] :=

n

  • ℓ=k

rφℓ(xk, Xk+1:ℓ)2 + qX2

ℓ+1,

where q ≥ 0 and r > 0 are real-valued coefficients.

Definition (Optimality)

A control policy ˆ

φ is optimal if for all x0 ˆ φ ∈ arg min

φ∈Φ

E(J[φ|x0]).

5 / 13

slide-9
SLIDE 9

Optimality of a control policy

Assume that at time k the controller knows the state history xk and noise history wk−1. We should only compare control policies φ ∈ Φ that could have resulted in xk and wk−1, i.e. such that xk, wk−1 and φ are a solution of the system dynamics.

Φ(xk, wk−1) :=

  • φ ∈ Φ: φ, xk and wk−1 are

a solution of the system dynamics.

  • 6 / 13
slide-10
SLIDE 10

Optimality of a control policy

Assume that at time k the controller knows the state history xk and noise history wk−1. We should only compare control policies φ ∈ Φ that could have resulted in xk and wk−1, i.e. such that xk, wk−1 and φ are a solution of the system dynamics.

Φ(xk, wk−1) :=

  • φ ∈ Φ: φ, xk and wk−1 are

a solution of the system dynamics.

  • Definition (Optimality)

A control policy ˆ

φ is optimal for the state history xk and the noise

history wk−1 if

ˆ φ ∈ arg min

φ∈Φ(xk,wk−1)

E(J[φ|xk]|wk−1).

6 / 13

slide-11
SLIDE 11

The principle of optimality

Principle of optimality

A control policy that is “optimal” for the “current state” should also be optimal for the “remaining states” it can end up in.

7 / 13

slide-12
SLIDE 12

The principle of optimality

Principle of optimality

A control policy that is “optimal” for the “current state” should also be optimal for the “remaining states” it can end up in. Assume that ˆ

φ is optimal for all x0 ∈ R.

The controller

1 observes x0, 2 applies u0 = φ0(x0), 3 observes x1 and computes w0.

Is ˆ

φ optimal for (x0, x1) and w0?

7 / 13

slide-13
SLIDE 13

The principle of optimality

Principle of optimality

A control policy that is “optimal” for the “current state” should also be optimal for the “remaining states” it can end up in. Assume that ˆ

φ is optimal for all x0 ∈ R.

The controller

1 observes x0, 2 applies u0 = φ0(x0), 3 observes x1 and computes w0.

Is ˆ

φ optimal for (x0, x1) and w0? Not necessarily! Definition (Complete optimality)

If for all k ∈ N the control policy φ ∈ Φ is optimal for all xk and

wk−1 such that xk, wk−1 and φ are compatible, then it is

completely optimal.

7 / 13

slide-14
SLIDE 14

Unique optimal control policy

Theorem

The unique completely optimal control policy ˆ

φ is given by ˆ φk(xk) := −˜ rkb

  • mk+1axk + hk|wk−1
  • .

˜ rk and mk+1 are derived from backwards recursive relations.

Feedforward hk|wk−1 is derived from hn+1|wn := 0 and

hk|wk−1 := a˜ rk+1rE(hk+1|wk−1,Wk|wk−1) + mk+1E(Wk|wk−1).

8 / 13

slide-15
SLIDE 15

Unique optimal control policy

Theorem

The unique completely optimal control policy ˆ

φ is given by ˆ φk(xk) := −˜ rkb

  • mk+1axk + hk|wk−1
  • .

˜ rk and mk+1 are derived from backwards recursive relations.

Feedforward hk|wk−1 is derived from hn+1|wn := 0 and

hk|wk−1 := a˜ rk+1rE(hk+1|wk−1,Wk|wk−1) + mk+1E(Wk|wk−1).

− Precise specification of noise model is necessary. − Calculating the feedforward is intractable. − Backwards recursive calculations + Almost immediately generalisable to time-dependent ak, bk,

rk and qk+1 and/or multi-dimensional systems.

8 / 13

slide-16
SLIDE 16

Unique optimal control policy

Disadvantages

− Calculating the feedforward is intractable.

Feedforward hk|wk−1 is derived from hn+1|wn := 0 and

hk|wk−1 := a˜ rk+1rE(hk+1|wk−1,Wk|wk−1)+mk+1E(Wk|wk−1).

9 / 13

slide-17
SLIDE 17

Unique optimal control policy

Disadvantages

− Calculating the feedforward is intractable. S White noise model: W0, . . . , Wn are mutually independent.

Feedforward hk is derived from hn+1 := 0 and

hk := a˜ rk+1rhk+1 + mk+1E(Wk).

9 / 13

slide-18
SLIDE 18

Unique optimal control policy

Disadvantages

− Calculating the feedforward is intractable. S White noise model: W0, . . . , Wn are mutually independent.

Feedforward hk is derived from hn+1 := 0 and

hk := a˜ rk+1rhk+1 + mk+1E(Wk).

− Backwards recursive calculations S White noise model & stationarity simplify these calculations.

If E(Wk) ≡ E(W) for all k ∈ N, then

mk+1 − − − →

n→∞ m,

˜ rk − − − →

n→∞ ˜

r, hk − − − →

n→∞ h.

9 / 13

slide-19
SLIDE 19

Partially specified noise model

− Precise specification of noise model is necessary. 10 / 13

slide-20
SLIDE 20

Partially specified noise model

− Precise specification of noise model is necessary.

Definition (Partially specified noise model or PSNM)

The partially specified noise model E is the largest subset of the set of all precise noise models such that for all E ∈ E, all k ∈ N and all wk−1

E(Wk) ≤ E(Wk|wk−1) ≤ E(Wk).

Note: E does not assume independence!

10 / 13

slide-21
SLIDE 21

Partially specified noise model

− Precise specification of noise model is necessary.

Definition (Partially specified noise model or PSNM)

The partially specified noise model E is the largest subset of the set of all precise noise models such that for all E ∈ E, all k ∈ N and all wk−1

E(Wk) ≤ E(Wk|wk−1) ≤ E(Wk).

Note: E does not assume independence!

Definition (E-admissibility)

A control policy is E-admissible if it is completely optimal for at least

  • ne precise noise model in the partially specified noise model.

10 / 13

slide-22
SLIDE 22

E-admissible control policies

From the definition of E-admissibility, it follows immediately that any E-admissible control policy has the form

φk(xk) = −˜ rkb

  • mk+1axk + hk|wk−1
  • .

11 / 13

slide-23
SLIDE 23

E-admissible control policies

Theorem

For any E-admissible control policy, the feedfworward term hk|wk−1 is bounded: for all k ∈ N and for all noise histories wk−1,

hk ≤ hk|wk−1 ≤ hk.

Moreover, any hk|wk−1 ∈ [hk, hk] is reached by some E ∈ E. Strict bounds hk and hk are derived from [hn+1, hn+1] := 0 and

[hk, hk] := a˜ rk+1r[hk+1, hk+1] + mk+1[E(Wk), E(Wk)].

11 / 13

slide-24
SLIDE 24

E-admissible control policies

Theorem

For any E-admissible control policy, the feedfworward term hk|wk−1 is bounded: for all k ∈ N and for all noise histories wk−1,

hk ≤ hk|wk−1 ≤ hk.

Moreover, any hk|wk−1 ∈ [hk, hk] is reached by some E ∈ E.

+ Imprecise specification + Computation of hk and

hk is tractable.

+ Easily generalised to

ak, bk, rk and qk+1.

? Which control policy to apply? − Backwards recursive

calculations

? Generalisation to

multi-dimensional systems is not immediate.

11 / 13

slide-25
SLIDE 25

E-admissible control policies

Stationarity and open questions

− Backwards recursive calculations S Stationarity of bounds on expectation simplifies these

calculations. If E(Wk) ≡ E(W) and E(Wk) ≡ E(W) for all k ∈ N, then

mk+1 − − − →

n→∞ m,

˜ rk − − − →

n→∞ ˜

r, hk − − − →

n→∞ h,

hk − − − →

n→∞ h.

12 / 13

slide-26
SLIDE 26

E-admissible control policies

Stationarity and open questions

− Backwards recursive calculations S Stationarity of bounds on expectation simplifies these

calculations. If E(Wk) ≡ E(W) and E(Wk) ≡ E(W) for all k ∈ N, then

mk+1 − − − →

n→∞ m,

˜ rk − − − →

n→∞ ˜

r, hk − − − →

n→∞ h,

hk − − − →

n→∞ h.

− Which control policy to apply? ? Possibility of using a secondary decision criterion. 12 / 13

slide-27
SLIDE 27

Summary

The partially specified noise model only assumes bounds on the conditional expectation of the noise.

13 / 13

slide-28
SLIDE 28

Summary

The partially specified noise model only assumes bounds on the conditional expectation of the noise. Every E-admissible control policy is a combination of the same state feedback and possibly different noise feedforward.

13 / 13

slide-29
SLIDE 29

Summary

The partially specified noise model only assumes bounds on the conditional expectation of the noise. Every E-admissible control policy is a combination of the same state feedback and possibly different noise feedforward. Tight bounds on E-admissible noise feedforward can be easily calculated. How to choose which element in the feedforward interval to apply remains an open question.

13 / 13

slide-30
SLIDE 30

Summary

The partially specified noise model only assumes bounds on the conditional expectation of the noise. Every E-admissible control policy is a combination of the same state feedback and possibly different noise feedforward. Tight bounds on E-admissible noise feedforward can be easily calculated. How to choose which element in the feedforward interval to apply remains an open question. Unfortunately, these results are not immediately generalised to multi-dimensional systems.

13 / 13