Linear least squares Non-consistent systems Ax = b , b / R ( A ) - - PowerPoint PPT Presentation

linear least squares
SMART_READER_LITE
LIVE PREVIEW

Linear least squares Non-consistent systems Ax = b , b / R ( A ) - - PowerPoint PPT Presentation

b Linear least squares Non-consistent systems Ax = b , b / R ( A ) This means b or a part of it belongs to the orthogonal complement of R ( A ) dim r A R ( A ) dim r A R ( A t ) b b = Ax r x r x A = b x t = x r + x h 0 R n R m b 0 Ax h


slide-1
SLIDE 1

Linear least squares

Non-consistent systems Ax = b, b / ∈ R (A) This means b or a part of it belongs to the orthogonal complement of R (A) R (At) N (A)

b b

xr xh xt = xr + xh Rn dim rA dim n − rA N (At) R (A) b Rm dim rA dim m − rA

b = Axr b = A x Axh = 0 Lecture 2: Least Squares 1 / 47

slide-2
SLIDE 2

Linear least squares

Linear least squares for a deviation such that b + e ∈ R (A) Such a substitution creates a consistent system of equations (but e unknown!) Ax = b + e Vector of residuals e = Ax − b How to choose the vector of residuals in order to make the modified righthand side of the system of equations consistent? min

x∈Rq(Ax − b)t(Ax − b)

Lecture 2: Least Squares 2 / 47

slide-3
SLIDE 3

Linear least squares optimization problem

Least squares derivation xtAtAx − 2btAx + btb =

n

  • i=1

n

  • j=1

γijξiξj − 2

n

  • i=1

n

  • j=1

αijβiξj + btb where γij are the entries of AtA, βi the ith component of b and ξi the ith component of x. Setting the derivative to zero w.r.t. ξk amounts to solving 2

n

  • i=1

γkiξi − 2

n

  • j=1

αjkβj = 0 These n equations can be recast as (AtA)x = Atb These are called the normal equations.

Lecture 2: Least Squares 3 / 47

slide-4
SLIDE 4

Normal equations

AtAx = Atb

◮ Consistency

rank AtA = rank AtA Atb

◮ Let xpart be a particular solution, then so is

x = xpart + Xz

◮ What does x map to?

Ate = At(Axpart − b) = 0 so residuals are orthogonal to column space A or e ∈ N (At) and Axpart is the orthogonal projection of b onto R (A)

Lecture 2: Least Squares 4 / 47

slide-5
SLIDE 5

Least squares solution

We can split b as b = b1 + b2, b1 ∈ R (A) , b2 ⊥ R (A) so we obtain e = (Ax − b1) − b2 = −b2 The minimal value for the objective function is ete = (Ax − b)t(Ax − b) = bt(Ip − A(AtA)−1At)b

Lecture 2: Least Squares 5 / 47

slide-6
SLIDE 6

Least squares solution

b a1 a2 p = Ax

R (A)

Lecture 2: Least Squares 6 / 47

slide-7
SLIDE 7

Computing the least squares solution

matlab demo

Avoid solving the normal equations!

>> [A b] ans = 2.999999999980000 e+00 3.000000000010000 e+00 5.999999999990000 e+00 1.999999999990000 e+00 2.000000000000000 e+00 3.999999999990000 e+00 2.000000000020000 e+00 2.000000000010000 e+00 4.000000000030000 e+00

  • 9.999999999900000e-01
  • 1.000000000010000 e+00
  • 2.000000000000000 e+00

1.999999999970000 e+00 1.999999999960000 e+00 3.999999999930000 e+00 2.000000000000000 e+00 2.000000000000000 e+00 4.000000000000000 e+00 9.999999999800000 e-01 1.000000000030000 e+00 2.000000000010000 e+00

Whilst the linear system Ax = b is consistent mathematically (b = a1 + a2), as evidenced by the singular values

>> svd ([A b]) ans = 1.272792206132957 e+01 4.107364844986984 e-11 3.982911029842893 e-16

the square of the singular values in the matrix AtA produces a relative gap larger than the machine precision ǫm

1, or

σ2

2(A) < σ2 1(A)ǫm

As a result, AtA cannot be inverted, and no solution is obtained.

1This assumes we are working in double precision, where ǫm ≈ 1 × 10−15. Lecture 2: Least Squares 7 / 47

slide-8
SLIDE 8

Least squares method and the SVD

We involve the singular value decomposition of A A = USV t = U1 U2 S1 V t

1

V t

2

  • = U1S1V t

1

Applied to normal equations, this gives AtAx = Atb gives (V 1StSV t

1)x = V 1SUt 1b

such that x = V 1S−1Ut

1b + V 2z ◮ Advantage over normal equations due to power of one of S

(squareroot-free method), compared to StS in the normal equations

◮ Operations reduced to orthogonal projection and scaling

Lecture 2: Least Squares 8 / 47

slide-9
SLIDE 9

Least squares method and the SVD

matlab demo

Finite precision acts as a nonlinearity, which may result in a different numerical rank for A than for AtA. Example: A = USV t where the singular values of A are σ1(A) = 1, σ2(A) = 1 × 10−9, σ3(A) = 0 and U =

 

1/ √ 2 1/ √ 2 1/ √ 2 −1/ √ 2 −1

 

V =

 

1/ √ 3 1/ √ 3 1/ √ 3 −1/ √ 2 1/ √ 2 1/ √ 6 −2/ √ 6 1/ √ 6

 

In matlab:

>> A A = 4.082482908721113 e-01

  • 4.999999999999999e-01

2.886751340174626 e-01 4.082482900556147 e-01

  • 4.999999999999999e-01

2.886751351721632 e-01 Lecture 2: Least Squares 9 / 47

slide-10
SLIDE 10

Least squares method and the SVD

matlab demo

The SVD of A is in agreement with the original decomposition, whereas the SVD of AtA truncates the dynamical range of the spectrum relative to σ2

1

>> [U1 ,S1 ,V1] = svd(A); U1 , V1 , diag(S1)' U1 =

  • 7.071067811865475e-01
  • 7.071067811865476e-01
  • 7.071067811865476e-01

7.071067811865475 e-01 1.000000000000000 e+00 V1 =

  • 5.773502691896258e-01
  • 5.773502315590063e-01
  • 5.773503068202428e-01

7.071067811865475 e-01 4.608790676874364 e-08

  • 7.071067811865464e-01
  • 4.082482904638632e-01

8.164966075365897 e-01

  • 4.082482372461314e-01

ans = 9.999999999999999 e-01 9.999999462588933 e-10 >> [U2 ,S2 ,V2] = svd(A'*A); U2 , V2 , diag(S2)' U2 =

  • 5.773502691896260e-01
  • 7.993060164940310e-01

1.666630092825368 e-01 7.071067811865475 e-01

  • 3.874131392520148e-01

5.915328051214906 e-01

  • 4.082482904638631e-01

4.593701683080254 e-01 7.888677847408841 e-01 V2 =

  • 5.773502691896257e-01

7.242703261997147 e-01 3.769604239880175 e-01 7.071067811865475 e-01 6.743633567555686 e-01

  • 2.126830107586453e-01
  • 4.082482904638631e-01

1.437586785273193 e-01

  • 9.014803246224582e-01

ans = 1.000000000000000 e+00 5.019856335253205 e-17 1.772060549260711 e-17 Lecture 2: Least Squares 10 / 47

slide-11
SLIDE 11

Least squares method and the SVD

Assume unique least squares solution by rank (A) = q, then x = V 1S−1Ut

1b

=

r

  • i=1

v i 1 σi ut

i b

= v1(ut

1

σ1 ) + . . . + v r(ut

r

σr ) Residuals follow by e = b − Ax = b − U1SV t

1(V 1S−1Ut 1b) = (I − U1Ut 1)b

Lecture 2: Least Squares 11 / 47

slide-12
SLIDE 12

Least squares method sensitivity

Relative accuracy and sensitivity of the least squares solution Perturb b x + δx = A†(b + δb) Take δb proportional relative to left singular vector ui, then δx2 δb2 = δbt(A†)tA†δb δb2 = 1 σ2

i

In the worst case, δb is proportional to uq, the smallest singular value, if σq δx2 = 1 σ2

q

δb2 Intuitive idea: we are working in the reverse direction, the inverse of the singular values of the mapping of domain of A to R (A) are the singular values

  • f the inverse mapping from R (A) to R (At)

Lecture 2: Least Squares 12 / 47

slide-13
SLIDE 13

Least squares method sensitivity

Worst relative error magnification max

b,δb

δx2 x2 /δb2 b2 = max

b,δb

δbt(A†)tA†δb δb2 / b2 bt(A†)tA†b = σ2

1

σ2

q

Intuitive idea: a vector b with only a component in the direction of u1, resulting from the small x blown up by σ1, is perturbed by a vector δb resulting from a large δx made insignificant by σn This number is called the condition number κA = σ1(A)/σn(A)

Lecture 2: Least Squares 13 / 47

slide-14
SLIDE 14

Least squares solution

Exact solution ˆ x = (AtA)−1At

  • exact

a

  • exact

, a ∈ R (A) Noise distorts vector a into given data b = a + f x = (AtA)−1Atb = (AtA)−1Ata + (AtA)−1Atf Error on the solution x − ˆ x = (AtA)−1Atf Important to understand, vector e relates to the matrix A and its orthogonal complement, vector f however relates to the noise added to the exact vector a. Generally, f contains components in the R (A), and thus f = e

Lecture 2: Least Squares 14 / 47

slide-15
SLIDE 15

Least squares solution

a a + f 1 a + f 2 a + f 3 a + f 4 a + f 5 a + f 6

R (A)

Lecture 2: Least Squares 15 / 47

slide-16
SLIDE 16

Least squares solution

Ax = a b = a + f 1 Aˆ x = b + e ˆ b = a − e

R (A)

Lecture 2: Least Squares 16 / 47

slide-17
SLIDE 17

Least squares statistical properties

First order and second order assumptions E [x − ˆ x] = (AtA)−1E [f ] = 0 (1) E (x − ˆ x)(x − ˆ x)t = (AtA)−1σ2 (2)

◮ unbiasedness: repeating a large number of experiments using the same

exact data a and A but different noise realizations of f (white noise), their average will be equal to the exact solution a

◮ error covariance: while the estimate is unbiased, certain directions in Rn

are estimated more accurately than others (see sensitivity)

Lecture 2: Least Squares 17 / 47

slide-18
SLIDE 18

White noise

Example: A ∈ R100×2, a ∈ R100, find least squares solution to Ax = a + f with f ∈ R100 and E [f f t] = (0.1 )2I If δx = x − ˆ x, we find E δxδxt = VS−2V tσ2 (3) E SV tδx = σ (4)

Lecture 2: Least Squares 18 / 47

slide-19
SLIDE 19

White noise

matlab demo

Case 1: S =

  • 1.2247

1

  • ,

V =

  • 0.6731

−0.7396 −0.7396 −0.6731

  • Take xexact = v1 + v 2 and compute eigenvalues of

E (x − xexact)(x − xexact)t for 500 samples of b: Λ

  • 0.0087

0.0021 0.0021 0.0079

  • = {0.0062, 0.0105}

giving a relative accuracy of 1.2987 ≈ 1.2247

Lecture 2: Least Squares 19 / 47

slide-20
SLIDE 20

White noise

matlab demo

−0.1 0.1 0.2 0.3 0.4 0.5 0.6 −0.3 −0.2 −0.1 0.1 0.2 0.3 x1 x2

Figure : Solutions to the least squares problem for 200 samples, perturbed by white noise, centered around the true solution. Ellipse contains 95 % of solutions.

Lecture 2: Least Squares 20 / 47

slide-21
SLIDE 21

White noise

matlab demo

Case 2: S =

  • 10

1

  • ,

V =

  • 0.6731

−0.7396 −0.7396 −0.6731

  • Take xexact = v1 + v 2 and compute eigenvalues of

E (x − xexact)(x − xexact)t for 500 samples of b: Λ

  • 0.0051

0.0046 0.0046 0.0043

  • = {0.0001, 0.0093}

giving a relative accuracy of 9.5999 ≈ 10

Lecture 2: Least Squares 21 / 47

slide-22
SLIDE 22

White noise

matlab demo

−0.1 0.1 0.2 0.3 0.4 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25 0.3 0.35 x1 x2

Figure : Solutions to the least squares problem for 200 samples, perturbed by white noise, centered around the true solution. Ellipse contains 95 % of solutions.

Lecture 2: Least Squares 22 / 47

slide-23
SLIDE 23

Colored noise

The statistical properties of the additive noise change, we solve Ax = a + f and E f f t = W with W a positive definite covariance matrix

Lecture 2: Least Squares 23 / 47

slide-24
SLIDE 24

Colored noise

How do we obtain better estimates for x by incorporating W ? Left-preconditioning with some P ∈ Rp×m and rank (PA) = n PAx = P(a + f ) (5)

  • (6)

x = ˆ x + (AtVA)AtVf −1 (7) with V = PtP and statistical properties E [x] = ˆ x (8) E (x − ˆ x)(x − ˆ x)t = (AtVA)−1AtVWVA(AtVA)−1 (9) The error covariance matrix is a function of our free to choose V ΣV = (AtVA)−1AtVWVA(AtVA)−1

Lecture 2: Least Squares 24 / 47

slide-25
SLIDE 25

Colored noise

Optimal choice of V to reduce variance? Σ(V 1) is better than Σ(V 2) if ztΣ(V 1)z ≤ ztΣ(V 2)z, ∀z ∈ Rn

  • r Σ(V 2) − Σ(V 1) 0

For every nonnegative definite matrix V, we have that Σ(V) Σ(W −1) such that P = √ W −1 Intuitive idea: least squares makes assumptions: uncorrelated errors and equal variance, weighting matrix P is a means to correct the least squares method for colored noise to fulfill these assumptions

Lecture 2: Least Squares 25 / 47

slide-26
SLIDE 26

Colored noise

matlab demo −0.1 0.1 0.2 0.3 0.4 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25 0.3 0.35 x1 x2

Figure : Solutions to the unweighted least squares problem for 200 samples, perturbed by colored noise (notch filter), centered around the true solution. Ellipse contains 95 % of solutions.

Lecture 2: Least Squares 26 / 47

slide-27
SLIDE 27

Colored noise

matlab demo −0.1 0.1 0.2 0.3 0.4 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25 0.3 0.35 x1 x2

Figure : Solutions to the weighted least squares problem for 200 samples, perturbed by colored noise (notch filter), centered around the true solution. Ellipse contains 95 % of solutions.

Lecture 2: Least Squares 27 / 47

slide-28
SLIDE 28

Colored noise

0.1552 0.1553 0.1554 0.1555 0.1556 0.1122 0.1122 0.1123 0.1123 0.1124 0.1124 0.1125 0.1125 0.1126 x1 x2

Figure : Solutions to the weighted least squares problem for 200 samples, perturbed by colored noise (notch filter), centered around the true solution. Ellipse contains 95 % of solutions. Close up.

Lecture 2: Least Squares 28 / 47

slide-29
SLIDE 29

Intermezzo (1): covariance matrix to noise samples

matlab demo

Let C ∈ Rn×n, C ≻ 0 , then there exists a Cholesky factorization C = RtR We can distort a matrix X ∈ Rl×n of l white noise samples (σ = 1) using R to

  • btain the desired colored noise samples

(XR)t(XR) = RtXtXR = RtIR = C Steps:

  • 1. choose positive definite covariance matrix C for f
  • 2. compute cholesky using “R = chol(C)”
  • 3. right-multiply X (white noise samples) by R

Lecture 2: Least Squares 29 / 47

slide-30
SLIDE 30

Intermezzo (1): covariance matrix to noise samples

matlab demo

Example: bandpass 0.2–0.22

200 400 600 800 1000 100 200 300 400 500 600 700 800 900 1000 −0.04 −0.03 −0.02 −0.01 0.01 0.02 0.03 0.04

−400 −350 −300 −250 −200 −150 −100 −50 50 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5

average power spectral density (N = 1024, L = 1000)

dB/Hz

−4 −3 −2 −1 1 2 3 4 100 200 300 400 500

n = 1, . . . , N (N = 500) φ

Lecture 2: Least Squares 30 / 47

slide-31
SLIDE 31

Intermezzo (1): covariance matrix to noise samples

matlab demo

Example: lowpass 0–0.05

200 400 600 800 1000 100 200 300 400 500 600 700 800 900 1000 −0.02 0.02 0.04 0.06 0.08 0.1

−100 −50 50 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5

average power spectral density (N = 1024, L = 1000)

dB/Hz

−4 −3 −2 −1 1 2 3 4 100 200 300 400 500

n = 1, . . . , N (N = 500) φ

Lecture 2: Least Squares 31 / 47

slide-32
SLIDE 32

Intermezzo (2): filter to covariance matrix

matlab demo

Impulse response can be computed from filter polynomial (numerator and denominator), e.g. notch filter 0.2–0.3

[c,d] = butter(6 ,[0.4 0.6]); % Important : Matlab uses norm. freq. of 1 freqz(c,d ,1000,1) % Ts = 1

−0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5 −80 −70 −60 −50 −40 −30 −20 −10 20 40 60 80 100 120 −0.4 −0.2 0.2 0.4 0.6 0.8 1 n (samples) Amplitude Impulse Response

Lecture 2: Least Squares 32 / 47

slide-33
SLIDE 33

Intermezzo (2): filter to covariance matrix

matlab demo

Colored noise f as a filtering of white noise e fi =

i−1

  • k=0

hi−kek Convolution can be expressed as a matrix vector product f = He =

     

h0 . . . h1 h0 . . . h2 h1 h0 . . . . . . ... . . . hN hN−1 hN−2 hN−3 . . . h0

     

such that W = E [f f t] W = HE eet Ht = σ2HHt

Lecture 2: Least Squares 33 / 47

slide-34
SLIDE 34

Intermezzo (2): filter to covariance matrix

matlab demo % First plot zfilter = filter(c,d,randn (1000 ,1024) ') '; % 1000 samples , length 1024 Zfilter = (1/ sqrt(1024))*fft(zfilter ,[] ,2) ; Pzfilter = mean(Zfilter.* conj(Zfilter)); Pzfilter = fftshift(Pzfilter); % Second plot [h,t] = impz(c,d); hn = length(h); h = [h; zeros(N-hn ,1) ]; t = [t; (N-hn+1:N) ']; H = toeplitz(h,[h(1) zeros(1,length(h) -1)]); W = H*H'; % Compute samples using W... Zcov = (1/ sqrt(1024))*fft(zcov ,[] ,2) ; Pzcov = mean(Zcov.*conj(Zcov)); Pzcov = fftshift(Pzcov );

Estimated frequency spectrums

−0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5 −80 −70 −60 −50 −40 −30 −20 −10 10 20 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5 −80 −70 −60 −50 −40 −30 −20 −10 10 20

Lecture 2: Least Squares 34 / 47

slide-35
SLIDE 35

Least squares method in system identification

A simple autoregressive (AR) model predicts future output yk based upon past

  • utputs yk−1, yk−2, . . . , yk−na where na is the AR model order.

   

yna−1 yna−2 . . . y1 y0 yna yna−1 . . . y2 y1 . . . . . . yN−1 yN−2 . . . yN−na+1 yN−na

         

−a1 −a2 . . . −ana−1 −ana

     

=

     

yna yna+1 . . . yN−1 yN

     

     

ena ena+1 . . . eN−1 eN

     

gives rise to least squares problem Ax = b + e Noise variance estimation E ete = (N − na + 1)σ2 Vector of residuals e may not be interpreted as measurement noise, but a perturbation to b in the least square sense

Lecture 2: Least Squares 35 / 47

slide-36
SLIDE 36

Least squares method in system identification

matlab demo

Example2: Step 1: generate an AR signal using white noise driven through an all-pole filter of order m

% true model d of order m b = fir1(1024 , .5) ; [d,p0] = lpc(b,m); % Generate AR time series u rng (0,'twister '); u = randn(ns ,1) ; x = filter(1,d,u);

Step 2: estimate AR model of order n using least squares

model = ar(x,n,'ls') >> d, model.a d = 1.0000

  • 3.5258

6.9530

  • 9.3074

8.9473

  • 6.1572

2.8428

  • 0.7059

ans = 1.0000

  • 3.5395

6.9962

  • 9.3778

9.0262

  • 6.2192

2.8749

  • 0.7127

2http://www.mathworks.nl/help/signal/examples/linear-prediction-and-autoregressive-modeling.html Lecture 2: Least Squares 36 / 47

slide-37
SLIDE 37

Least squares method in system identification

matlab demo

Step 3: power spectral density comparison for time signal and esimated model

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 −220 −200 −180 −160 −140 −120 −100 −80 −60 −40 −20 frequency

  • ne−sided PSD (dB/Hz)

PSD estimate of x PSD of model output Lecture 2: Least Squares 37 / 47

slide-38
SLIDE 38

Least squares method in system identification

matlab demo

Step 4: prediction errors with horizon3 of 1

Time Response Comparison Time (seconds) Amplitude 20 40 60 80 100 120 140 160 180 200 −0.08 −0.06 −0.04 −0.02 0.02 0.04 0.06 0.08 y1 Original autoregressive signal Signal estimate from linear predictor

In practice a validation set is used to compare with, independent of the training set used to estimate the model.

3Measured output values in the time series up to time k − M, with M the horizon are used to

predict the output yk using the estimated model. The larger the horizon, the larger the error will become due to propagation of prediction errors in yk−M, . . . , yk−1.

Lecture 2: Least Squares 38 / 47

slide-39
SLIDE 39

Recursive least squares

Why recursive least squares

◮ time series ◮ real life systems are not time invariant: θ(t) ◮ big data prefers incremental updates

Classic example of recursive updating: computing the mean ¯ xN =

N

  • k=1

xk

  • /N

¯ xN+1 =

N+1

  • k=1

xk

  • /(N + 1)

          

¯ xN+1 = (N¯ xN + xN+1)/(N + 1)

N = 10000; x = randn(N ,1);

  • ldmean = mean(x) % N -1 additions

xnew = randn; x = [x; xnew]; newmean = mean(x) % N additions newmean = (N*xmold + xnew)\(N+1) % 1 addition, 2 multiplication Lecture 2: Least Squares 39 / 47

slide-40
SLIDE 40

Recursive least squares

Recursion in the least squares method framework xN = (At

NAN)−1At NbN

= PNAt

NbN

Example: AR model AN =

   

ϕt

na

ϕt

na+1

. . . ϕt

N

    ,

bN =

   

yna − ena yna+1 − ena+1 . . . yN − eN

   

with ϕt

k =

yk−1 yk−2 . . . yk−na

  • Lecture 2: Least Squares

40 / 47

slide-41
SLIDE 41

Recursive least squares

How do we express xN+1 in terms of xN, yN+1? xN+1 = (At

N+1AN+1)−1At N+1bN+1

= PN+1At

N+1bN+1

where AN+1 =

  • AN

ϕt

N+1

  • (10)

At

N+1AN+1 = At NAN + ϕN+1ϕt N+1

(11) = P−1

N + ϕN+1ϕt N+1

(12) PN+1 = (P−1

N + ϕN+1ϕt N+1)−1

(13) = PN − PNϕN+1(1 + ϕt

N+1PNϕN+1

  • scalar

)−1ϕt

N+1PN 4

(14)

4matrix inverse lemma Lecture 2: Least Squares 41 / 47

slide-42
SLIDE 42

Recursive least squares

At

N+1bN+1 = At NbN + ϕN+1bN+1

By substitution we obtain xN+1 = xN + PNϕN+1 1 + ϕt

N+1PNϕN+1 (bN+1 − ϕt N+1xN)

(15) PN+1 = PN − PNϕN+1ϕt

N+1PN

1 + ϕt

N+1PNϕN+1

(16) Remarks

◮ recursion requires some initial conditions x0 and positive definite P0, can

be obtained from least squares using already available data

◮ not well-conditioned from a numerical point of view

Lecture 2: Least Squares 42 / 47

slide-43
SLIDE 43

Recursive least squares

Coping with time varying systems5

  • 1. Forgetting parameter λ diminishes influence of past measurements

xN+1 = xN + PNϕN+1 λ + ϕt

N+1PNϕN+1

(bN+1 − ϕt

N+1xN)

PN+1 = 1 λ

  • PN − PNϕN+1ϕt

N+1PN

λ + ϕt

N+1PNϕN+1

  • 2. Kalman filter approach: assume true parameter vector is not constant but

varies like a random walk θ(k + 1) = θ(k) + w(k) E w(k)wt(k) = R1(k) recursive least squares formulation becomes xN+1 = xN + PNϕN+1 R2(N) + ϕt

N+1PNϕN+1

(bN+1 − ϕt

N+1xN)

PN+1 = PN − PNϕN+1ϕt

N+1PN

R2(N) + ϕt

N+1PNϕN+1

+ R1(N) where R2(k) is the noise covariance matrix E [f (k)f t(k)]

5function rarx in matlab Lecture 2: Least Squares 43 / 47

slide-44
SLIDE 44

AutoRegressive modeling with eXogeneous input (ARX)

An ARX model predicts future output yk based upon past inputs uk−1, . . . , uk−nb, past outputs yk−1, . . . , yk−na and the current input uk yk + a1yk−1 + . . . + anayk−na = b1uk + . . . + bnbuk−nb + ek Finding parameters can be done using least squares

 

yna−1 yna−2 . . . y1 y0 una . . . una−nb+2 una−nb+1 yna yna−1 . . . y2 y1 una+1 . . . una−nb+3 una−nb+2 . . . . . . . . . . . . yN−1 yN−2 . . . yN−na+1 yN−na uN . . . uN−nb+2 uN−nb+1

          

−a1 −a2 . . . −ana−1 −ana b1 . . . bnb−1 bnb

        

=

  

yna yna+1 . . . yN−1 yN

   −   

ena ena+1 . . . eN−1 eN

  

Lecture 2: Least Squares 44 / 47

slide-45
SLIDE 45

Recursive least squares in system identification

matlab demo

Example:

  • 1. first ARX model: lowpass filter

yk − 1.5yk−1 + 0.7yk−2 = uk−1 + 0.5uk−2 + ek

  • 2. second ARX model: bandpass filter

yk − 0.8yk−1 + 0.9yk−2 = 0.5uk−1 + 0.2uk−2 + ek

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 −20 −15 −10 −5 5 10 15 20 25 normalized frequency (Ts = 1) magnitude (dB/Hz) 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 −20 −15 −10 −5 5 10 15 20 25 normalized frequency (Ts = 1) magnitude (dB/Hz)

Lecture 2: Least Squares 45 / 47

slide-46
SLIDE 46

Recursive least squares in system identification

matlab demo

Case 1: no additive noise (ek = 0), forgetting parameter λ = 0.9, parameters change midpoint

200 400 600 800 1000 1200 1400 1600 1800 2000 10

−20

10

−15

10

−10

10

−5

10 10

5

Figure : Error plots θ1 − θest (blue) and θ2 − θest (red), where θ1 = −1.5 0.7 1.0 0.5 and θ2 = −0.8 0.9 0.5 0.2

Lecture 2: Least Squares 46 / 47

slide-47
SLIDE 47

Recursive least squares in system identification

matlab demo

Case 2: additive noise (σ = 0.01), forgetting parameter λ = 0.9, parameters change midpoint

200 400 600 800 1000 1200 1400 1600 1800 2000 10

−20

10

−15

10

−10

10

−5

10 10

5

Figure : Error plots θ1 − θest (blue) and θ2 − θest (red), where θ1 = −1.5 0.7 1.0 0.5 and θ2 = −0.8 0.9 0.5 0.2

Lecture 2: Least Squares 47 / 47