[PPT] - RECENT ADVANCES IN . SUBSPACE IDENTIFICATION GIORGIO PICCI Dept. PowerPoint Presentation

SLIDE 1

RECENT ADVANCES IN . SUBSPACE IDENTIFICATION

GIORGIO PICCI

Dept. of Information Engineering,

Universit` a di Padova, Italy

MTNS 2006 KYOTO July 2006

SLIDE 2

OUTLINE OF THE TALK

BASIC IDEA OF SUBSPACE IDENTIFICATION: STOCHASTIC REAL-

IZATION + REGRESSION

UNIFIED ANALYSIS OF SUBSPACE METHODS WITH INPUTS (CCA,

N4SID, MOESP ,...): ILL-CONDITIONING, “WORST-CASE” INPUTS, CON- SISTENCY CONDITIONS..

TRANSPARENT ASYMPTOTIC VARIANCE EXPRESSIONS, RELATION

TO ILL-CONDITIONING

SAME IDEA SOLVES THE PROBLEM OF FEEDBACK:

STATE SPACE CONSTRUCTION WORKS WITH CLOSED LOOP DATA

1

SLIDE 3

DATA GENERATING MECHANISM

I/O data: sample paths of second order stationary processes y, u , zero mean, with rational spectrum ⇒ described by

x(t +1)

y(t)

=
A

B C D x(t) u(t)

+
K

I

e(t)

yd(t) := C(zI −A)−1Bu(t)+Du(t) ys(t) := C(zI −A)−1Ke(t)+e(t) deterministic + stochastic components

2

SLIDE 4

+ +

y(t) yd(t) u(t) ys(t)

✲ ✚✙ ✛✘ ✲

C(zI −A)−1B+D C(zI −A)−1K +I

✲ ❄ ❄

e(t)

No feedback from y to u: e(t) ⊥ u(τ)

∀ t τ

3

SLIDE 5

(UNIQUE) INNOVATION MODEL

e(t) Innovation white noise: one step ahead prediction error of y(t). Stationarity and no feedback ⇒ A stable: (| λ(A) |< 1) With feedback A may be unstable x(t): steady-state Kalman predictor based on joint infinite past of y, u.

4

SLIDE 6

PARAMETRIZATION OF LINEAR STOCHASTIC SYSTEMS

y, u second order stationary processes described by

x(t +1)

y(t)

=
A

B C D x(t) u(t)

+
K

I

e(t)

(1)

No feedback from y to u: e(t) ⊥ u(τ)

∀t ,τ

A

B C D

= E{
x(t +1)

y(t) x(t) u(t)

⊤

}

E{
x(t)

u(t) x(t) u(t)

⊤

}

−1

Parameters are uniquely determined by choosing basis x(t) in the state space !

5

SLIDE 7

HILBERT SPACES OF RANDOM VARIABLES

Inner product ξ,η := E{ξ η},

E mathematical expectation.

For −∞ ≤ t0 ≤ t ≤ T ≤ +∞ define the Hilbert space of scalar scalar zero- mean random variables U[t0,t) := span{uk(s); k = 1,..., p, t0 ≤ s < t } Y[t0,t) := span{yk(s); k = 1,...,m, t0 ≤ s < t } future spaces up to time T U[t,T] := span{uk(s); k = 1,..., p, t ≤ s ≤ T } Y[t,T] := span{yk(s); k = 1,...,m, t ≤ s ≤ T } When t0 = −∞ use U−

t , Y− t for

U[−∞,t) , Y[−∞,t).

6

SLIDE 8

ELEMENTARY HILBERT SPACE GEOMETRY

E[z | X] (vector of) orthogonal projections (conditional expectations in the

Gaussian case) of the components of z onto the subspace X.

E[Y | X] Hilbert subspace spanned by the r. v’s {E[η | X] | η ∈ Y}.

Let A ∩ B = {0} i.e. A + B direct sum,

E{z | A+B} = E||A{z | B}+E||B{z | A} E||A{z | B} is the oblique projection of z onto B along A.

7

SLIDE 9

IDENTIFICATION

From observed input-output time series {y0,y1,y2,...,yN}, yt ∈ Rm {u0,u1,u2,...,uN}, ut ∈ Rp find estimates (in a certain basis) ˆ

A

B C D

N

such that (consistency) lim

N→∞

ˆ

A

B C D

N

=

A

B C D

8

SLIDE 10

BASIC IDEA OF SUBSPACE IDENTIFICATION

Assume we can observe also the state trajectory {x0,x1,x2,...,xN}, cor- rresponding to the I/O data {y0,y1,y2,...,yN}, yt ∈ Rm {u0,u1,u2,...,uN}, ut ∈ Rp Form “tail” matrices Yt, Xt, Ut Yt := [ yt, yt+1, yt+2,...] Xt := [ xt, xt+1, xt+2,...] Ut := [ ut, ut+1, ut+2,...] Every sample trajectory {yt}, {xt}, {ut} of the system must satisfy the ”true” model equations, so

Xt+1

Yt

=
A

B C D

Xt

Ut

+
K

I

Et

9

SLIDE 11

BASIC IDEA OF SUBSPACE IDENTIFICATION (cont’d)

Xt+1

Yt

=
A

B C D

Xt

Ut

+
K

I

Et

Linear Regression ! Solve by Least Squares : min

A,C,B,D

Xt+1

Yt

−
A

B C D

Xt

Ut

getting

ˆ

A

B C D

N

:= 1 N

Xt+1

Yt

Xt

Ut

⊤

1 N

Xt

Ut

Xt

Ut

⊤−1

10

SLIDE 12

BASIC IDEA OF SUBSPACE IDENTIFICATION (cont’d)

ˆ

A

B C D

N

:= 1 N

Xt+1

Yt

Xt

Ut

⊤

1 N

Xt

Ut

Xt

Ut

⊤−1

If the data are second order ergodic and the inverse exists: lim

N→∞

ˆ

A

B C D

N

=

A

B C D

consistent estimate of A, B, C, D.

11

SLIDE 13

SECOND ORDER ERGODICITY

For N → ∞ sample covariances converge to true covariances, say 1 N

t+N

k=t

{yk u⊤

k } = 1

N YtU⊤

s → E{y(t)u(s)⊤}

N → ∞ For N → ∞ the average Euclidean inner product of tail sequences con- verges to the inner product of the corresp. random variables (asymptotic isometry). As N → ∞: Hilbert space geometry of (semi-infinite) tail sequences is the same as Hilbert space geometry of random variables y(t) ≡ Yt, u(t) ≡ Ut, isometry

12

SLIDE 14

(SEMI-INFINITE) TAIL SUBSPACES

Sample fluctuations (i.e. finite data length) play no role in the analysis, can assume that N → ∞ and work as with the stochastic setting. U[t,T ] :=

   

Ut Ut+1 . . . UT

   

→ u+

t :=

   

u(t) u(t +1) . . . u(T)

   

U[t0,t ) :=

   

Ut0 Ut0+1 . . . Ut−1

   

→ u−

t :=

   

u(t0) u(t0 +1) . . . u(t −1)

   

same notation for Y[t0,t ) etc..

13

SLIDE 15

BACK TO THE (IDEAL) SUBSPACE ID PROCEDURE

STATE SEQUENCE IS NOT AVAILABLE: NEED TO CONSTRUCT THE STATE FROM INPUT-OUTPUT DATA (STOCHASTIC REALIZATION) Fundamental step: Stochastic realization to construct the state from I/O data. Easy if infinite past data were available at time t: U−

t := span{Us | s < t},

Y−

t := span{Ys | s < t}

H-spaces generated by all past inputs/outputs from (−∞,t] Generalize procedure of Akaike, and L.P .: Construct the oblique predictor space, X+/−

t

:= E||U+

t

Y+

t | Y− t ∨U− t

Pick basis vector in X+/−

t

.... ⇒ innovation model !

14

SLIDE 16

NUISANCE: ONLY FINITE DATA ARE AVAILABLE !

In practice can regress only on finite past data at time t In practice can work with U[t0,t ), Y[t0,t ) from (small) finite past/future in- tervals [t0, t ), [t , T]. L.S. Estimates depend on sample covariances.... Finite-interval approximation of infinite-past regression leads to errors (bias) in the estimate which do not → 0 as N → ∞. If zeros of the system arbitrarily close to the unit circle, bias can be made arbitrarily large. Want consistency with finite regression data: NEED FINITE-INTERVAL (NON–STATIONARY) STOCHASTIC REALIZATION

15

SLIDE 17

CONSTRUCTING THE STATE FROM FINITE INPUT-OUTPUT DATA

PROBLEM: Construct the state space of a stochastic realization of y using ONLY the r.v.’s of input and output processes from a finite interval [t0,T]. Try to mimic the infinite past construction: Future output predictor + oblique projection

16

SLIDE 18

THE OUTPUT PREDICTOR (TAIL MATRICES)

Output Predictor based on joint input-output data (wedge denotes vector sum): ˆ Y[t,T ] := E

Y[t,T ] | Y[t0,t ) ∨U[t0,T ]
= Γ ˆ

Xt +H U[t,T ] set ν := T −t (future horizon) Γ :=

   

C CA . . . CAν−1

   

H :=

   

D ... CB D ... . . . ... . . . CAν−1B CAν−2B ... CB D

   

ˆ Xt : Transient conditional Kalman filter on [t0,T]:

ˆ

Xt+1 = A ˆ Xt +BUt +K(t) ˆ Et Yt = C ˆ Xt +DUt + ˆ Et

17

SLIDE 19

A TECHNICAL DIFFICULTY WITH FINITE DATA

With finite data need to “factor out” the dynamics of u ˆ x(t) = E

x(t) | y[t0,t ) ∨u[t0,T ]
initial condition depends on all input history

ˆ x(t0) = E

x(t0) | u[t0T]
cannot recover ˆ

x(t) by oblique projection along future u’s since part of the state is in u[t,T ]! Leads to complications (a plethora of algorithms: MOESP , N4SID, CCA, etc...). Some people don’t care and use infinite-past approximation.

18

SLIDE 20

THE N4SID ALGORITHM [vanOverschee-DeMoor94]

1. Predictor matrix based on joint input-output data

ˆ Y[t,T ] := E

Y[t,T ] | Y[t0,t ) ∨U[t0,T ]
(projection onto the joint rowspace).
2. From this compute the observability matrix Γ by an oblique projection +

SVD factorization.

3. “Pseudostate” ¯

Xt := Γ† ˆ Y[t,T ] obeys the recursion

¯

Xt+1 Yt

=
A

C

¯

Xt +

K1

K2

U[t,T ] +W⊥

K1 K2 known linear functions of (B,D).

4. Solve by LS for the unknown parameters (A, C) and (K1, K2).

19

SLIDE 21

(STOCHASTIC) VAN OVERSCHEE–DE MOOR MODEL (N → ∞)

Pseudostate: ¯ x(t) := Γ†ˆ y+

t

= ˆ x(t)+Γ†Hu+

t

¯

x(t +1) y(t)

=
A

C

¯

x(t)+

K1

K2

u+

t ⊕w⊥ t

(∗) K1 K2 known linear functions of (B,D). Solve the regression for the parameters (W-H equations)

A

C

Σ¯

x¯ x|u+

=

Σ¯

x1¯ x|u+

Σ¯

y¯ x|u+

K1

K2

Σu+u+|¯

x =

Σ¯

x1u+|¯ x

Σyu+|¯

x

20

SLIDE 22

Introduce: ˆ xc(t) = ¯ x(t)−E ¯ x(t) | u+

t

= ˆ

x(t)−E ˆ x(t) | u+

t

so:

Σ¯

x ¯ x|u+ = Σˆ x ˆ x|u+ = Σˆ xc ˆ xc

FACT: the parameters are obtained from (W-H equations)

A

C

Σˆ

xc ˆ xc

=

Σ¯

x1¯ x|u+

Σ¯

y¯ x|u+

K1

K2

Σu+u+|¯

x =

Σ¯

x1u+|¯ x

Σyu+|¯

x

Involves Conditional Covariances:

Σ¯

x ¯ x|u+ = E{

¯ x(t)−E (¯ x(t) | u+

t )

¯ x(t)−E (¯ x(t) | u+

t )⊤}

Σu+u+|¯

x = E{

u+

t −E(u+ t | ¯

x(t)) u+

t −E(u+ t | ¯

x(t))⊤}

21

SLIDE 23

MOESP: ORTHOGONALIZING REGRESSORS

¯

Xt+1 Yt

=
A

C

¯

Xt +

K1

K2

U[t,T ] ⊕W⊥

Orthogonalize regressors ˆ Xc

t := ¯

Xt −EN{ ¯ Xt | U[t,T ]}

ˆ

Xc

t+1

Yt

=
A

C

¯

Xc

t ⊕

Kc

1

Kc

2

U[t,T ] ⊕W⊥

(N → ∞) Complementary state : ˆ xc(t) := ¯ x(t)−E ¯ x(t) | u+

t

ˆ

xc(t +1) y(t)

=
A

C

ˆ

xc(t)⊕

Kc

1

Kc

2

u+

t ⊕w⊥

22

SLIDE 24

ASYMPTOTIC FINITE-INTERVAL REGRESSION

A

C

Σˆ

xcˆ xc

=

Σˆ

xc

1ˆ

xc

Σyˆ

xc

Kc

1

Kc

2

Σu+u+ =
Σˆ

xc

1u+|ˆ

xc

Σyu+|ˆ

xc

Recall:

ˆ xc(t) = ¯ x(t)−E ¯ x(t) | u+

t

= ˆ

x(t)−E ˆ x(t) | u+

t

so:

Σ¯

x ¯ x|u+ = Σˆ x ˆ x|u+ = Σˆ xc ˆ xc

SAME FORMULAS AS N4SID !!!

23

SLIDE 25

CONSISTENCY CONDITION AND ILL-CONDITIONING

Jansson-Wahlberg consistency condition: Σˆ

xˆ x|u+ = Σˆ xc ˆ xc

MUST BE NON SINGULAR! Σˆ

xc ˆ xc (= Σˆ xˆ x|u+) may be ILL– CONDITIONED! ⇒

The computation of the parameters (A, C) of the regression will be ill- conditioned: random fluctuation errors in the data will be amplified. Σˆ

xˆ x|u+ ILL-CONDITIONED ⇔ Rowspaces of ˆ

Xt and U[t,T ] are “NEARLY PARALLEL” Similar analysis holds for (K1, K2) and Σu+u+|¯

x.

24

SLIDE 26

PRINCIPAL ANGLES (Canonical correlations)

Ill-conditioning occurs when the PRINCIPAL ANGLES between state space and future inputs are small (canonical correlations near to 1) σMAX{ ˆ Xt, U[t,T ]} ≃ 1 ⇔ Σˆ

xcˆ xc

Nearly Singular !

25

SLIDE 27

PROBING INPUTS (ASYMPTOTICS FOR N,T → ∞) Theorem 1 Assume u has a rational spectral density matrix Φu. The maximal canonical correlation coefficients σk(X,U+) are obtained when, and only when there are zeros of the spec- tral density matrix Φu of u cancelling all the poles of the deter- ministic transfer function F(z) = C(zI −A)−1B+D.

How to deal with ill-conditioning? Sometimes Decoupling + Orthogonaliza- tion helps.

26

SLIDE 28

ASYMPTOTIC VARIANCE OF A, C

Theorem 2 Under standard assumptions on the true innovation noise, the estimation errors ˜ AN := ˆ AN −A, ˜ CN := ˆ CN −C are asymptotically Normal, limN→∞N E

vec ˜

AN

vec ˜

AN

⊤

=

Σ−1

ˆ xcˆ xc ⊗[M Hs]

·

·

|τ|≤ν Σˆ

xcˆ xc(τ)⊗Σ¯ e+¯ e+(τ)·

Σ−1

ˆ xcˆ xc ⊗[M Hs]

limN→∞N E
vec ˜

CN

vec ˜

CN

⊤

=

Σ−1

ˆ xcˆ xc ⊗[RHs]

·

·

|τ|<ν Σˆ

xcˆ xc (τ)⊗Σe+e+(τ)·

Σ−1

ˆ xcˆ xc ⊗[RHs]

27

SLIDE 29

NOTATIONS

M := [(K Γ†)−A(Γ† 0n×m)] R := [(Im 0m×m(ν−1))− CΓ†] Γ the observability matrix in a certain basis. Hs : =

   

I ... CK I ... . . . ... . . . CAν−1K CAν−2K ... CK I

   

e+

t

:=

   

e(t) e(t +1) . . . e(T −1)

   

¯ e+

t :=

e+

t

e(T)

Σe+e+(τ)

:= E{e+

t+τ (e+ t )⊤}

Σ ¯

e+¯ e+(τ) = E{¯

e+

t+τ (¯

e+

t )⊤}

28

SLIDE 30

These formulas are valid for N4SID, MOESP, and also CCA.

Σ−1

ˆ xcˆ xc = Σ−1 ˆ xˆ x|u+ Very “large” for ill-conditioned problems, the variance of

the estimation errors will also be large.

No (or white) input: Σˆ

xˆ x|u+ ≡ Σˆ xˆ x

29

SLIDE 31

WEIGHTS AND CCA

Complementary output predictors ˆ yc(t) := E

y(t) | U⊥

[t,T ]

ˆ

yc(t +k) := E

y(t +k) | U⊥

[t,T ]

Stack in a column vector

ˆ y+

t . Note :

row-span{ˆ y+

t } = span{ˆ

xc

k(t); k = 1,...,n} = ˆ

Xc

t

Weighted SVD : WE{ˆ y+

t (ˆ

y+

t )⊤}W⊤ = UΣ2 nU⊤

Σ2

n = diag{σ2 1,...,σ2 n}

Choose the canonical basis ˆ xc(t) := Σ−1/2

n

U⊤W ˆ y+

t

(here N = ∞) For W = square root of “future Conditional Toeplitz” , ˆ xc(t) is the canonical state of CCA.

30

SLIDE 32

COMPARISON WITH PREVIOUS RESULTS

[Bauer, Bauer-Ljung, Bauer-Jansson]: asymptotic formulas valid for N → ∞ AND p := t −t0 (past data horizon), tending to infinity with N at a certain rate Estimates neglect transient due to FINITE-INTERVAL DATA. Consistency

nly for p → ∞

Different asymptotic formulas for different methods, CCA, MOESP , N4SID

etc. Some difficult to use.

Aymptotic formulas are valid for FINITE p and “transient” estimates ( in practice can only regress on finite past). Stationary approxim’s are biased for finite p.

31

SLIDE 33

SUBSPACE IDENTIFICATION WITH FEEDBACK

+ + + +

y ? u e ?

✛ ✲ ✒✑ ✓✏ ✲

F(z)

✲ ✒✑ ✓✏ ✲ ✻ ❄

G(z)

❄

+ F(∞) = 0.

32

SLIDE 34

PROBLEMS WITH STATE CONSTRUCTION

y(t +k) = CAkx(t)+“terms in U+

t ”+“terms in E+ t ”

k = 0,1,... Classical (N4SID, CVA, MOESP) construct the state space via the oblique projection EU+

t

Y+

t | Y− t ∨U− t

To get rid of the noise terms need

E+

t ⊥ U+ t

which is equivalent to Absence of Feedback from y to u. (Granger) Open problem for quite some time, see the discussion in [Ljung-McKelvey 1996].

33

SLIDE 35

REMEDY (Chiuso-Picci 2004)

FACT: x(t) is also the state of the predictor model

x(t +1)

= (A−KC)x(t)+Bu(t)+Ky(t) ˆ y(t | t −1) = Cx(t) ˆ y(t +k | t +k −1) = C(A−KC)kx(t)+“terms in U+

t ∨Y+ t ”

X+/−

t

= EU+

t ∨Y+ t

ˆ

Y+

t | U− t ∨Y− t

Jansson 2003 Compute predictor space removing the effect of undesired

terms pre-estimating Markov parameters of predictor using an ARX model.

34

SLIDE 36

“PREDICTOR IDENTIFICATION ALGORITHM:

1. Compute the oblique predictors

ˆ y(t +k | t −1) := EU[t,t+k)∨Y[t,t+k)

y(t +k) | Y[t0,t) ∨U[t0,t)
2. Compute ˆ

X+/−

t

as “best” n-dimensional approximation of the space spanned by ˆ y(t +k | t −1), k = 0,..,ν, repeat for ˆ X+/−

t+1

3. Solve regression in the least squares sense to get ˆ

A, ˆ B, ˆ C, ˆ K.

35

SLIDE 37

COMMENTS:

The classical subspace procedure to construct the state space turns
ut to be WRONG if data are collected in closed-loop.
Subspace methods based on the predictor model

work also with feedback !

Predictor is always stable (joint spectrum bounded away from zero

⇒ |λ(A−KC)| < 1.)

Ideally predictor space can be constructed without any assumption on

feedback channel.

36

SLIDE 38

REMARKS

1. Predictor identification “ideally” yields consistent estimators
2. Practically need to work with finite past starting from a certain time t0.
3. If number of data points ([yt,yt+1,..,yt+N]) N → ∞, but t −t0 fixed and

finite Consistency not guaranteed.

4. Reason: “Transient” predictor (transient Kalman filter) involves also the

dynamics of u !

37

SLIDE 39

References

[1]. A. Chiuso, G. Picci (2004), “On the Ill-conditioning of subspace identification with inputs”. Automatica, 40(4), pp. 575-589. [2]. A. Chiuso, G. Picci (2004), “Numerical conditioning and asymptotic variance of subspace estimates”. Automatica, 40(4), pp. 677-683. [3]. A. Chiuso and G. Picci (2004), , “Subspace identification by data or- thogonalization and model decoupling”, Automatica, 40(4), pp. 1689- 1703.

38

SLIDE 40

[4]. A. Chiuso, G. Picci (2003) “Subspace Identification of Random Pro- cesses with Feedback”, in Proc. of the IFAC Int. Symposium on Sys- tem Identification (SYSID), Rotterdam, 2003. [5]. A. Chiuso, G. Picci (2005), “Consistency Analysis of some Closed- Loop Subspace Identification Methods”, Automatica, 41 pp 377-391. [6]. A. Chiuso, G. Picci (2004), “Prediction Error vs. Subspace Methods in Closed Loop Identification”. Proc. 16th IFAC World Congress.

SLIDE 41

CONCLUSIONS

UNIFIED ANALYSIS OF SUBSPACE METHODS WITH INPUTS (CCA,

N4SID, MOESP ,...) NEARLY PARALLEL REGRESSORS: ILL-CONDITIONING, “WORST-CASE” ANALYSIS.

TRANSPARENT ASYMPTOTIC VARIANCE EXPRESSIONS (RELATED

TO ILL-CONDITIONING)

SUBSPACE METHODS WITH CLOSED LOOP DATA !

39

SLIDE 42

APPLICATIONS

Assume for simplicity that A has simple eigenvalues. here is an eigenvalue λi of A such that the difference between the i-the eigenvalue of ˆ AN, ˆ λi

N, and λi, satisfies

ˆ λi

N −λi ≃ v⊤ i ˜

ANui v⊤

i ui

+O( ˜ AN2) where vi and ui are the normalized left and right eigenvectors of A corre- spoding to λi. NE(ˆ λi

N −λi)2 =

1 (v⊤

i ui)2(u⊤ i ⊗v⊤ i )NE

vec ˜

AN

vec ˜

AN

⊤

(ui ⊗vi) Note that (v⊤

i ui)2 is the square of the cosine of the angle between the two

eigenvectors and is equal to one if the matrix A is symmetric (in which case vi = ui).

40

SLIDE 43

ASYMPTOTIC VARIANCE OF (B,D)

The vectorized parameter estimates vec( ˆ K1,N) vec(K2,N) form an asymptotically Gaussian sequence AsVar √ Nvec( ˆ K1,N) = ¯ G

|τ|≤ν Σu+u+|¯

x(τ)⊗Σe+e+(τ)

¯

G⊤ AsVar √ Nvec( ˆ K2,N) = G

|τ|<ν Σu+u+|¯

x(τ)⊗Σe+e+(τ)

G⊤

G := Σ−1

u+u+|¯ x ⊗[RHs],

¯ G := Σ−1

u+u+|¯ x ⊗[M ¯

Hs] R and M being as before, and, Σ¯

u+¯ u+|¯ x(τ) := E{˜

¯ u+

t+τ (˜

¯ u+

t )⊤},

Σe+e+(τ) = E{e+

t+τ (e+ t )⊤}

41

SLIDE 44

˜ u+

t+τ the τ-steps ahead stationary shift of the random vector ˜

u+

t := u+ t −

E

u+

t | ¯

x(t) .