Undermodelling Detection with Sign-Perturbed Sums e 1,2 Marco Campi - - PowerPoint PPT Presentation

undermodelling detection with sign perturbed sums
SMART_READER_LITE
LIVE PREVIEW

Undermodelling Detection with Sign-Perturbed Sums e 1,2 Marco Campi - - PowerPoint PPT Presentation

Undermodelling Detection with Sign-Perturbed Sums e 1,2 Marco Campi 3 aji 2 Erik Weyer 4 Algo Car` Bal azs Cs 1 Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands 2 Institute for Computer Science and Control (SZTAKI), Hungarian


slide-1
SLIDE 1

Undermodelling Detection with Sign-Perturbed Sums

Algo Car` e 1,2 Marco Campi 3 Bal´ azs Cs´ aji 2 Erik Weyer 4

1Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands 2Institute for Computer Science and Control (SZTAKI), Hungarian Academy of Sciences (MTA), Hungary 3Department of Information Engineering (DII), University of Brescia, Italy 4Department of Electrical and Electronic Engineering, University of Melbourne, Australia

IFAC World Congress, Toulouse, France, July 10, 2017

slide-2
SLIDE 2

Table of contents

  • I. Introduction
  • II. Standard SPS for Linear Regression
  • III. SPS with Undermodelling Detection
  • IV. Numerical Experiments
  • V. Summary and Conclusions

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 2

slide-3
SLIDE 3

Motivations

  • SPS (Sign-Perturbed Sums) builds confidence regions around

the LS (least squares) estimate of linear regression problems.

  • Only mild statistical assumptions are needed, e.g., symmetry.
  • Not needed: stationarity, moments, particular distributions.
  • SPS has many nice properties (as we will see later), most

importantly its confindence regions are exact.

  • Regarding the models, the assumption of SPS is that the true

system generating the observations is in the model class.

  • However, if the model class is wrong, SPS cannot detect it.
  • Here, we suggest an extension of SPS, UD-SPS, that still

builds exact confidence sets, if the model is correct, but can also detect, in the long run, if the system is undermodelled.

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 3

slide-4
SLIDE 4

Linear Regression

Consider a standard linear regression problem:

Linear Regression

yt ϕT

t θ∗ + wt

where yt — output (for time t = 1, . . . , n) ϕt — regressor (exogenous, d dimensional) wt — noise (independet, symmetric) θ∗ — true parameter (deterministic, d dimensional) Φn = [ϕ1, . . . , ϕn]T — skinny and full rank

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 4

slide-5
SLIDE 5

Least Squares

Given: a sample, Z, of size n of outputs {yt} and regressors {ϕt} A classical approach is to minimize the least squares criterion V(θ | Z) 1 2

n

  • t=1

(yt − ϕT

t θ)2.

The least squares estimate (LSE) can be found by solving

Normal Equation

θV(ˆ

θn | Z) =

n

  • t=1

ϕt(yt − ϕT

t ˆ

θn) = 0

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 5

slide-6
SLIDE 6

Confidence Ellipsoids

LSE is asymptotically normal (under some technical conditions) √n (ˆ θn − θ∗)

d

− → N(0, σ2 R−1) as n → ∞, where R is the limit of Rn = 1

n

n

t=1 ϕtϕT t as n → ∞ (if exists).

Confidence Ellipsoid

  • Θn,µ
  • θ ∈ Rd : (θ − ˆ

θn)T Rn (θ − ˆ θn) ≤ µ ˆ σ2

n

n

  • where P(θ∗ ∈

Θn,µ) ≈ Fχ2(d)(µ), where Fχ2(d) is the CDF of χ2(d), ˆ σ2

n 1 n−d n

  • t=1

(yt − ϕT

t ˆ

θn)2, is an estimate of σ2.

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 6

slide-7
SLIDE 7

Reference and Sign-Perturbed Sums

Let us introduce a reference sum and m − 1 sign-perturbed sums.

Reference Sum

S0(θ) R

− 1

2

n n

  • t=1

ϕt(yt − ϕT

t θ)

Sign-Perturbed Sums

Si(θ) R

− 1

2

n n

  • t=1

ϕt αi,t(yt − ϕT

t θ)

for i = 1, . . . , m − 1, where αi,t (t = 1, . . . , n) are i.i.d. random signs, that is αi,t = ±1 with probability 1/2 each (Rademacher).

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 7

slide-8
SLIDE 8

Intuitive Idea: Distributional Invariance

Recall: {wt} are independent and each wt is symmetric about zero. Observe that, if θ = θ∗, we have (i = 1, . . . , m − 1)

Distributional Invariance

S0(θ∗) = R

− 1

2

n n

  • t=1

ϕtwt Si(θ∗) = R

− 1

2

n n

  • t=1

ϕt αi,twt Consider the ordering S(0)(θ∗)2 ≺ · · · ≺ S(m−1)(θ∗)2 Note: relation “≺” is the canonical “<” with random tie-breaking All orderings are equally probable! (they are conditionally i.i.d.)

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 8

slide-9
SLIDE 9

Intuitive Idea: Reference Dominance

What if θ = θ∗? In fact, the reference paraboloid S0(θ)2 increases faster than {Si(θ)2}, thus will eventually dominate the ordering. Intuitively, for “large enough” ˜ θ, where ˜ θ θ∗ − θ

Eventual Dominance of the Reference Paraboloid

  • n
  • t=1

ϕtϕT

t ˜

θ +

n

  • t=1

ϕtwt

  • 2

R−1

n

>

  • n
  • t=1

±ϕtϕT

t ˜

θ +

n

  • t=1

±ϕtwt

  • 2

R−1

n

with “high probability” (for simplicity ± is used instead of {αi,t}).

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 9

slide-10
SLIDE 10

Non-Asymptotic Confidence Regions

The rank of S0(θ)2 in the ordering of {Si(θ)2} w.r.t. ≺ is R(θ) = 1 +

m−1

  • i=1

I(Si(θ)2 ≺ S0(θ)2), where I(·) is an indicator function.

Sign-Perturbed Sums (SPS) Confidence Regions

  • Θn
  • θ ∈ Rd : R( θ ) ≤ m − q
  • where m > q > 0 are user-chosen integers (design parameters).

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 10

slide-11
SLIDE 11

Exact Confidence

(A1) {wt} is a sequence of independent random variables. Each wt has a symmetric probability distribution about zero. (A2) The outer product of regressors is invertible, det(Rn) = 0.

Exact Confidence of SPS

P

  • θ∗ ∈

Θn

  • = 1 − q

m for finite samples. Parameters m and q are under our control. Note that S0(ˆ θn)2 = 0, thus ˆ θn ∈ Θn, assuming it is non-empty.

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 11

slide-12
SLIDE 12

Star Convexity

Set X ⊆ Rd is star convex if there is a star center c ∈ Rd with ∀x ∈ X, ∀ β ∈ [0, 1] : β x + (1 − β) c ∈ X.

Star Convexity of SPS

  • Θn is star convex with the LSE, ˆ

θn, as a star center Hint Θn is the union and intersection of ellipsoids containing LSE.

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 12

slide-13
SLIDE 13

Strong Consistency

(A1) independence, symmetricity: {wt} are independent, symmetric (A2) invertibility: Rn 1

n

n

t=1 ϕtϕT t is invertible

(A3) regressor growth rate: ∞

t=1 ϕt4/t2 < ∞

(A4) noise moment growth rate: ∞

t=1

  • E[w2

t ]

2/t2 < ∞ (A5) Ces` aro summability: lim

n→∞ Rn = R, which is positive definite

Strong Consistency of SPS

P

  • k=1

  • n=k
  • Θn ⊆ Bε(θ∗)

= 1, where Bε(θ∗) { θ ∈ Rd : θ − θ∗ ≤ ε } is a norm ball.

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 13

slide-14
SLIDE 14

Ellipsoidal Outer Approximation

The reference paraboloid can be rewritten as S0(θ)2 = (θ − ˆ θn)TRn(θ − ˆ θn). From which an alternative description of the confidence region is

  • Θn ⊆
  • θ ∈ Rd : (θ − ˆ

θn)TRn(θ − ˆ θn) ≤ r(θ)

  • ,

where r(θ) is the qth largest value of {Si(θ)2}i=0.

Ellipsoidal Outer Approximation

  • Θn ⊆
  • θ ∈ Rd : (θ − ˆ

θn)TRn(θ − ˆ θn) ≤ r∗ Where r∗ can be efficiently computed by a semi-definite program.

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 14

slide-15
SLIDE 15

Undermodelling

Assume we are given a (finite) sample of input and output data, {ut}, {yt}, which we model with an FIR system

  • yt(θ) ϕT

t θ + wt,

where ϕt [ ut−1, . . . , ut−d ]⊤

The true data generation system

yt = ϕ⊤

t θ∗ + et + nt,

where et is an extra component that can depend on all past inputs ut−d−1, ut−d−2, . . . and on all past noises nt−1, nt−2 . . .. If {et} are nonzero, then the SPS confidence regions will still (almost surely) shrink, but around a wrong parameter value.

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 15

slide-16
SLIDE 16

SPS with Undermodelling Detection

UD-SPS is obtained from SPS by replacing {Si(θ)} with Q0(θ)

  • Rn

Bn B⊤

n

Dn − 1

2 1

n

n

  • t=1

ϕt ψt

  • (yt − ϕ⊤

t θ),

Qi(θ)

  • Rn

Bn B⊤

n

Dn − 1

2 1

n

n

  • t=1

αi,t ϕt ψt

  • (yt − ϕ⊤

t θ),

where ψt is a vector that includes s extra input values preceding the ˆ nb that are included in ϕt, ψt [ ut−d−1, . . . , ut−d−s ]⊤, and Bn 1 n

n

  • t=1

ϕtψ⊤

t ,

Dn 1 n

n

  • t=1

ψtψ⊤

t .

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 16

slide-17
SLIDE 17

The Connection of UD-SPS and SPS

The connection of UD-SPS and SPS can be stated as

Reducing UD-SPS to SPS

The UD-SPS region, Θo

n, for estimating θ∗ ∈ Rd can be

interpreted as the restriction to a d-dimensional space of a standard SPS region, Θ′

n, that lives in the domain {θ′ ∈ Rd+s}.

Rd+s is the d-dimensional identification space augmented with s extra components: Θo

n can be identified with the first d

components of the set Θ′

n ∩ (Rd × {0}s).

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 17

slide-18
SLIDE 18

UD-SPS with Correct System Specifications

Theorem (Exact Confidence of UD-SPS)

If the FIR system is correctly specified, then P{ θ∗ ∈ Θo

n } = 1 − q/m.

Theorem (Strong Consistency of UD-SPS)

If the FIR system is correctly specified, then (under some technical conditions) for all ε > 0, we have that P ∞

  • ¯

n=1 ∞

  • n=¯

n

  • Θo

n ⊆ Bε(θ∗)

  • = 1,

where Bε(θ∗) denotes an ε-ball centred around θ∗.

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 18

slide-19
SLIDE 19

UD-SPS in the Presense of Undermodelling

Theorem (Undermodelling Detection)

Assume that the system is undermodelled, that is {et} are nonzero (and some technical conditions hold). With the notations ¯ R′ lim

n→∞

Rn Bn B⊤

n

Dn

  • ,

¯ E ′ lim

n→∞

1 n

n

  • t=1

ϕt ψt

  • E[et],

if the following detectability condition holds ¯ R′−1 ¯ E ′ / ∈ Rˆ

nb × {0}s,

then P ∞

  • ¯

n=1 ∞

  • n=¯

n

  • Θo

n = ∅

  • = 1.

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 19

slide-20
SLIDE 20

Numerical Experiments

Consider the following ARX(1,1) data generating system yt = a∗yt−1 + b∗ut−1 + nt, with zero initial conditions, where a∗ = 0.5 or 0.15 or 0 (see later), b∗ = 1, {nt} are i.i.d. Laplacian with mean 0 and variance 0.1. The input signal is generated as ut = 0.75ut−1 + vt, where {vt} are i.i.d. standard normal random variables. The user-chosen predictor is an FIR(1) model

  • yt(θ) = ϕ⊤

t θ = b ut−1,

that is, the autoregressive part is missing, θ = [ b ] is the model parameter, and ϕt = [ ut−1 ] is the regressor at time t.

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 20

slide-21
SLIDE 21

95% UD-SPS Confidence Intervals, a∗ = 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5 0.6 b ˜ b

n=20 n=50 n=100

LSE of b*(n=20) LSE of b*(n=50) LSE of b*(n=100) (b*,0) (b*,b*)

~

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 21

slide-22
SLIDE 22

95% UD-SPS Confidence Intervals, a∗ = 0.15

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5 0.6 b ˜ b 1.2 Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 22

slide-23
SLIDE 23

95% UD-SPS Confidence Intervals, a∗ = 0.5

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 b ˜ b Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 23

slide-24
SLIDE 24

Summary and Conclusions

  • SPS (Sign-Perturbed Sums) is a powerful finite sample system

identification method that builds exact, star convex, strongly consistent confidence regions for linear regression problems.

  • SPS also has efficient ellipsoidal outer-approximations.
  • However, SPS cannot detect if the model class is wrong.
  • Here, we suggested an extension of SPS, called UD-SPS,

that still guarantees exact and strongly consistent confidence regions if the model order is correctly specified.

  • Furthermore, it can detect, in the long run, if the model is

undermodelled (detection = empty confidence region).

  • There is a strong connection between SPS and UD-SPS.
  • The theoretical results were also confirmed by numerical

experiments: FIR models of ARX systems were studied.

Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 24

slide-25
SLIDE 25

Thank you for your attention!

balazs.csaji@sztaki.mta.hu