RECENT ADVANCES IN . SUBSPACE IDENTIFICATION
GIORGIO PICCI
- Dept. of Information Engineering,
RECENT ADVANCES IN . SUBSPACE IDENTIFICATION GIORGIO PICCI Dept. - - PowerPoint PPT Presentation
RECENT ADVANCES IN . SUBSPACE IDENTIFICATION GIORGIO PICCI Dept. of Information Engineering, Universit` a di Padova, Italy MTNS 2006 KYOTO July 2006 OUTLINE OF THE TALK BASIC IDEA OF SUBSPACE IDENTIFICATION: STOCHASTIC REAL- IZATION +
IZATION + REGRESSION
N4SID, MOESP ,...): ILL-CONDITIONING, “WORST-CASE” INPUTS, CON- SISTENCY CONDITIONS..
TO ILL-CONDITIONING
STATE SPACE CONSTRUCTION WORKS WITH CLOSED LOOP DATA
1
I/O data: sample paths of second order stationary processes y, u , zero mean, with rational spectrum ⇒ described by
y(t)
B C D x(t) u(t)
I
yd(t) := C(zI −A)−1Bu(t)+Du(t) ys(t) := C(zI −A)−1Ke(t)+e(t) deterministic + stochastic components
2
+ +
y(t) yd(t) u(t) ys(t)
✲ ✚✙ ✛✘ ✲
C(zI −A)−1B+D C(zI −A)−1K +I
✲ ❄ ❄
e(t)
∀ t τ
3
e(t) Innovation white noise: one step ahead prediction error of y(t). Stationarity and no feedback ⇒ A stable: (| λ(A) |< 1) With feedback A may be unstable x(t): steady-state Kalman predictor based on joint infinite past of y, u.
4
y, u second order stationary processes described by
y(t)
B C D x(t) u(t)
I
(1)
∀t ,τ
B C D
y(t) x(t) u(t)
}
u(t) x(t) u(t)
}
Parameters are uniquely determined by choosing basis x(t) in the state space !
5
Inner product ξ,η := E{ξ η},
For −∞ ≤ t0 ≤ t ≤ T ≤ +∞ define the Hilbert space of scalar scalar zero- mean random variables U[t0,t) := span{uk(s); k = 1,..., p, t0 ≤ s < t } Y[t0,t) := span{yk(s); k = 1,...,m, t0 ≤ s < t } future spaces up to time T U[t,T] := span{uk(s); k = 1,..., p, t ≤ s ≤ T } Y[t,T] := span{yk(s); k = 1,...,m, t ≤ s ≤ T } When t0 = −∞ use U−
t , Y− t for
U[−∞,t) , Y[−∞,t).
6
Gaussian case) of the components of z onto the subspace X.
Let A ∩ B = {0} i.e. A + B direct sum,
7
From observed input-output time series {y0,y1,y2,...,yN}, yt ∈ Rm {u0,u1,u2,...,uN}, ut ∈ Rp find estimates (in a certain basis) ˆ
B C D
such that (consistency) lim
N→∞
ˆ
B C D
=
B C D
Assume we can observe also the state trajectory {x0,x1,x2,...,xN}, cor- rresponding to the I/O data {y0,y1,y2,...,yN}, yt ∈ Rm {u0,u1,u2,...,uN}, ut ∈ Rp Form “tail” matrices Yt, Xt, Ut Yt := [ yt, yt+1, yt+2,...] Xt := [ xt, xt+1, xt+2,...] Ut := [ ut, ut+1, ut+2,...] Every sample trajectory {yt}, {xt}, {ut} of the system must satisfy the ”true” model equations, so
Yt
B C D
Ut
I
9
Yt
B C D
Ut
I
Linear Regression ! Solve by Least Squares : min
A,C,B,D
Yt
B C D
Ut
ˆ
B C D
:= 1 N
Yt
Ut
1 N
Ut
Ut
10
ˆ
B C D
:= 1 N
Yt
Ut
1 N
Ut
Ut
If the data are second order ergodic and the inverse exists: lim
N→∞
ˆ
B C D
=
B C D
11
For N → ∞ sample covariances converge to true covariances, say 1 N
t+N
{yk u⊤
k } = 1
N YtU⊤
s → E{y(t)u(s)⊤}
N → ∞ For N → ∞ the average Euclidean inner product of tail sequences con- verges to the inner product of the corresp. random variables (asymp- totic isometry). As N → ∞: Hilbert space geometry of (semi-infinite) tail sequences is the same as Hilbert space geometry of random variables y(t) ≡ Yt, u(t) ≡ Ut, isometry
12
Sample fluctuations (i.e. finite data length) play no role in the analysis, can assume that N → ∞ and work as with the stochastic setting. U[t,T ] :=
Ut Ut+1 . . . UT
→ u+
t :=
u(t) u(t +1) . . . u(T)
U[t0,t ) :=
Ut0 Ut0+1 . . . Ut−1
→ u−
t :=
u(t0) u(t0 +1) . . . u(t −1)
same notation for Y[t0,t ) etc..
13
STATE SEQUENCE IS NOT AVAILABLE: NEED TO CONSTRUCT THE STATE FROM INPUT-OUTPUT DATA (STOCHASTIC REALIZATION) Fundamental step: Stochastic realization to construct the state from I/O data. Easy if infinite past data were available at time t: U−
t := span{Us | s < t},
Y−
t := span{Ys | s < t}
H-spaces generated by all past inputs/outputs from (−∞,t] Generalize procedure of Akaike, and L.P .: Construct the oblique predictor space, X+/−
t
:= E||U+
t
t | Y− t ∨U− t
t
.... ⇒ innovation model !
14
In practice can regress only on finite past data at time t In practice can work with U[t0,t ), Y[t0,t ) from (small) finite past/future in- tervals [t0, t ), [t , T]. L.S. Estimates depend on sample covariances.... Finite-interval approximation of infinite-past regression leads to errors (bias) in the estimate which do not → 0 as N → ∞. If zeros of the system arbitrarily close to the unit circle, bias can be made arbitrarily large. Want consistency with finite regression data: NEED FINITE-INTERVAL (NON–STATIONARY) STOCHASTIC REALIZATION
15
PROBLEM: Construct the state space of a stochastic realization of y using ONLY the r.v.’s of input and output processes from a finite interval [t0,T]. Try to mimic the infinite past construction: Future output predictor + oblique projection
16
Output Predictor based on joint input-output data (wedge denotes vector sum): ˆ Y[t,T ] := E
Xt +H U[t,T ] set ν := T −t (future horizon) Γ :=
C CA . . . CAν−1
H :=
D ... CB D ... . . . ... . . . CAν−1B CAν−2B ... CB D
ˆ Xt : Transient conditional Kalman filter on [t0,T]:
Xt+1 = A ˆ Xt +BUt +K(t) ˆ Et Yt = C ˆ Xt +DUt + ˆ Et
17
With finite data need to “factor out” the dynamics of u ˆ x(t) = E
ˆ x(t0) = E
x(t) by oblique projection along future u’s since part of the state is in u[t,T ]! Leads to complications (a plethora of algorithms: MOESP , N4SID, CCA, etc...). Some people don’t care and use infinite-past approximation.
18
ˆ Y[t,T ] := E
SVD factorization.
Xt := Γ† ˆ Y[t,T ] obeys the recursion
Xt+1 Yt
C
Xt +
K2
K1 K2 known linear functions of (B,D).
19
Pseudostate: ¯ x(t) := Γ†ˆ y+
t
= ˆ x(t)+Γ†Hu+
t
x(t +1) y(t)
C
x(t)+
K2
t ⊕w⊥ t
(∗) K1 K2 known linear functions of (B,D). Solve the regression for the parameters (W-H equations)
x¯ x|u+
x1¯ x|u+
y¯ x|u+
x =
x1u+|¯ x
x
Introduce: ˆ xc(t) = ¯ x(t)−E ¯ x(t) | u+
t
x(t)−E ˆ x(t) | u+
t
Σ¯
x ¯ x|u+ = Σˆ x ˆ x|u+ = Σˆ xc ˆ xc
FACT: the parameters are obtained from (W-H equations)
xc ˆ xc
x1¯ x|u+
y¯ x|u+
x =
x1u+|¯ x
x
Σ¯
x ¯ x|u+ = E{
¯ x(t)−E (¯ x(t) | u+
t )
¯ x(t)−E (¯ x(t) | u+
t )⊤}
Σu+u+|¯
x = E{
u+
t −E(u+ t | ¯
x(t)) u+
t −E(u+ t | ¯
x(t))⊤}
21
Xt+1 Yt
C
Xt +
K2
Orthogonalize regressors ˆ Xc
t := ¯
Xt −EN{ ¯ Xt | U[t,T ]}
Xc
t+1
Yt
C
Xc
t ⊕
1
Kc
2
(N → ∞) Complementary state : ˆ xc(t) := ¯ x(t)−E ¯ x(t) | u+
t
xc(t +1) y(t)
C
xc(t)⊕
1
Kc
2
t ⊕w⊥
22
xcˆ xc
xc
1ˆ
xc
xc
1
2
xc
1u+|ˆ
xc
xc
ˆ xc(t) = ¯ x(t)−E ¯ x(t) | u+
t
x(t)−E ˆ x(t) | u+
t
Σ¯
x ¯ x|u+ = Σˆ x ˆ x|u+ = Σˆ xc ˆ xc
SAME FORMULAS AS N4SID !!!
23
Jansson-Wahlberg consistency condition: Σˆ
xˆ x|u+ = Σˆ xc ˆ xc
MUST BE NON SINGULAR! Σˆ
xc ˆ xc (= Σˆ xˆ x|u+) may be ILL– CONDITIONED! ⇒
The computation of the parameters (A, C) of the regression will be ill- conditioned: random fluctuation errors in the data will be amplified. Σˆ
xˆ x|u+ ILL-CONDITIONED ⇔ Rowspaces of ˆ
Xt and U[t,T ] are “NEARLY PARALLEL” Similar analysis holds for (K1, K2) and Σu+u+|¯
x.
24
Ill-conditioning occurs when the PRINCIPAL ANGLES between state space and future inputs are small (canonical correlations near to 1) σMAX{ ˆ Xt, U[t,T ]} ≃ 1 ⇔ Σˆ
xcˆ xc
Nearly Singular !
25
How to deal with ill-conditioning? Sometimes Decoupling + Orthogonaliza- tion helps.
26
Theorem 2 Under standard assumptions on the true innovation noise, the estimation errors ˜ AN := ˆ AN −A, ˜ CN := ˆ CN −C are asymptotically Normal, limN→∞N E
AN
AN
=
ˆ xcˆ xc ⊗[M Hs]
·
xcˆ xc(τ)⊗Σ¯ e+¯ e+(τ)·
ˆ xcˆ xc ⊗[M Hs]
CN
CN
=
ˆ xcˆ xc ⊗[RHs]
·
xcˆ xc (τ)⊗Σe+e+(τ)·
ˆ xcˆ xc ⊗[RHs]
M := [(K Γ†)−A(Γ† 0n×m)] R := [(Im 0m×m(ν−1))− CΓ†] Γ the observability matrix in a certain basis. Hs : =
I ... CK I ... . . . ... . . . CAν−1K CAν−2K ... CK I
e+
t
:=
e(t) e(t +1) . . . e(T −1)
¯ e+
t :=
t
e(T)
:= E{e+
t+τ (e+ t )⊤}
Σ ¯
e+¯ e+(τ) = E{¯
e+
t+τ (¯
e+
t )⊤}
28
These formulas are valid for N4SID, MOESP, and also CCA.
ˆ xcˆ xc = Σ−1 ˆ xˆ x|u+ Very “large” for ill-conditioned problems, the variance of
the estimation errors will also be large.
xˆ x|u+ ≡ Σˆ xˆ x
29
Complementary output predictors ˆ yc(t) := E
[t,T ]
yc(t +k) := E
[t,T ]
ˆ y+
t . Note :
row-span{ˆ y+
t } = span{ˆ
xc
k(t); k = 1,...,n} = ˆ
Xc
t
Weighted SVD : WE{ˆ y+
t (ˆ
y+
t )⊤}W⊤ = UΣ2 nU⊤
Σ2
n = diag{σ2 1,...,σ2 n}
Choose the canonical basis ˆ xc(t) := Σ−1/2
n
U⊤W ˆ y+
t
(here N = ∞) For W = square root of “future Conditional Toeplitz” , ˆ xc(t) is the canonical state of CCA.
30
[Bauer, Bauer-Ljung, Bauer-Jansson]: asymptotic formulas valid for N → ∞ AND p := t −t0 (past data horizon), tending to infinity with N at a certain rate Estimates neglect transient due to FINITE-INTERVAL DATA. Consistency
Different asymptotic formulas for different methods, CCA, MOESP , N4SID
Aymptotic formulas are valid for FINITE p and “transient” estimates ( in practice can only regress on finite past). Stationary approxim’s are biased for finite p.
31
+ + + +
y ? u e ?
✛ ✲ ✒✑ ✓✏ ✲
F(z)
✲ ✒✑ ✓✏ ✲ ✻ ❄
G(z)
❄
+ F(∞) = 0.
32
y(t +k) = CAkx(t)+“terms in U+
t ”+“terms in E+ t ”
k = 0,1,... Classical (N4SID, CVA, MOESP) construct the state space via the oblique projection EU+
t
t | Y− t ∨U− t
E+
t ⊥ U+ t
which is equivalent to Absence of Feedback from y to u. (Granger) Open problem for quite some time, see the discussion in [Ljung-McKelvey 1996].
33
FACT: x(t) is also the state of the predictor model
= (A−KC)x(t)+Bu(t)+Ky(t) ˆ y(t | t −1) = Cx(t) ˆ y(t +k | t +k −1) = C(A−KC)kx(t)+“terms in U+
t ∨Y+ t ”
X+/−
t
= EU+
t ∨Y+ t
Y+
t | U− t ∨Y− t
terms pre-estimating Markov parameters of predictor using an ARX model.
34
ˆ y(t +k | t −1) := EU[t,t+k)∨Y[t,t+k)
X+/−
t
as “best” n-dimensional approximation of the space spanned by ˆ y(t +k | t −1), k = 0,..,ν, repeat for ˆ X+/−
t+1
A, ˆ B, ˆ C, ˆ K.
35
work also with feedback !
⇒ |λ(A−KC)| < 1.)
feedback channel.
36
finite Consistency not guaranteed.
dynamics of u !
37
[1]. A. Chiuso, G. Picci (2004), “On the Ill-conditioning of subspace identi- fication with inputs”. Automatica, 40(4), pp. 575-589. [2]. A. Chiuso, G. Picci (2004), “Numerical conditioning and asymptotic variance of subspace estimates”. Automatica, 40(4), pp. 677-683. [3]. A. Chiuso and G. Picci (2004), , “Subspace identification by data or- thogonalization and model decoupling”, Automatica, 40(4), pp. 1689- 1703.
38
[4]. A. Chiuso, G. Picci (2003) “Subspace Identification of Random Pro- cesses with Feedback”, in Proc. of the IFAC Int. Symposium on Sys- tem Identification (SYSID), Rotterdam, 2003. [5]. A. Chiuso, G. Picci (2005), “Consistency Analysis of some Closed- Loop Subspace Identification Methods”, Automatica, 41 pp 377-391. [6]. A. Chiuso, G. Picci (2004), “Prediction Error vs. Subspace Methods in Closed Loop Identification”. Proc. 16th IFAC World Congress.
N4SID, MOESP ,...) NEARLY PARALLEL REGRESSORS: ILL-CONDITIONING, “WORST-CASE” ANALYSIS.
TO ILL-CONDITIONING)
39
Assume for simplicity that A has simple eigenvalues. here is an eigenvalue λi of A such that the difference between the i-the eigenvalue of ˆ AN, ˆ λi
N, and λi, satisfies
ˆ λi
N −λi ≃ v⊤ i ˜
ANui v⊤
i ui
+O( ˜ AN2) where vi and ui are the normalized left and right eigenvectors of A corre- spoding to λi. NE(ˆ λi
N −λi)2 =
1 (v⊤
i ui)2(u⊤ i ⊗v⊤ i )NE
AN
AN
(ui ⊗vi) Note that (v⊤
i ui)2 is the square of the cosine of the angle between the two
eigenvectors and is equal to one if the matrix A is symmetric (in which case vi = ui).
40
The vectorized parameter estimates vec( ˆ K1,N) vec(K2,N) form an asymp- totically Gaussian sequence AsVar √ Nvec( ˆ K1,N) = ¯ G
x(τ)⊗Σe+e+(τ)
G⊤ AsVar √ Nvec( ˆ K2,N) = G
x(τ)⊗Σe+e+(τ)
G := Σ−1
u+u+|¯ x ⊗[RHs],
¯ G := Σ−1
u+u+|¯ x ⊗[M ¯
Hs] R and M being as before, and, Σ¯
u+¯ u+|¯ x(τ) := E{˜
¯ u+
t+τ (˜
¯ u+
t )⊤},
Σe+e+(τ) = E{e+
t+τ (e+ t )⊤}
41
˜ u+
t+τ the τ-steps ahead stationary shift of the random vector ˜
u+
t := u+ t −
u+
t | ¯
x(t) .