Undermodelling Detection with Sign-Perturbed Sums e 1,2 Marco Campi - PowerPoint PPT Presentation

Undermodelling Detection with Sign-Perturbed Sums e 1,2 Marco Campi 3 aji 2 Erik Weyer 4 Algo Car` Bal´ azs Cs´ 1 Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands 2 Institute for Computer Science and Control (SZTAKI), Hungarian Academy of Sciences (MTA), Hungary 3 Department of Information Engineering (DII), University of Brescia, Italy 4 Department of Electrical and Electronic Engineering, University of Melbourne, Australia IFAC World Congress, Toulouse, France, July 10, 2017

Table of contents I. Introduction II. Standard SPS for Linear Regression III. SPS with Undermodelling Detection IV. Numerical Experiments V. Summary and Conclusions Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 2

Motivations • SPS (Sign-Perturbed Sums) builds confidence regions around the LS (least squares) estimate of linear regression problems. • Only mild statistical assumptions are needed, e.g., symmetry. • Not needed: stationarity, moments, particular distributions. • SPS has many nice properties (as we will see later), most importantly its confindence regions are exact. • Regarding the models, the assumption of SPS is that the true system generating the observations is in the model class. • However, if the model class is wrong, SPS cannot detect it. • Here, we suggest an extension of SPS, UD-SPS, that still builds exact confidence sets, if the model is correct, but can also detect, in the long run, if the system is undermodelled. Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 3

Linear Regression Consider a standard linear regression problem: Linear Regression t θ ∗ + w t y t � ϕ T where y t — output (for time t = 1 , . . . , n ) ϕ t — regressor (exogenous, d dimensional) w t — noise (independet, symmetric) θ ∗ — true parameter (deterministic, d dimensional) Φ n = [ ϕ 1 , . . . , ϕ n ] T — skinny and full rank Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 4

Least Squares Given: a sample, Z , of size n of outputs { y t } and regressors { ϕ t } A classical approach is to minimize the least squares criterion n � V ( θ | Z ) � 1 t θ ) 2 . ( y t − ϕ T 2 t =1 The least squares estimate (LSE) can be found by solving Normal Equation n � θ V (ˆ t ˆ ϕ t ( y t − ϕ T ∇ θ n | Z ) = θ n ) = 0 t =1 Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 5

Confidence Ellipsoids LSE is asymptotically normal (under some technical conditions) √ n (ˆ → N (0 , σ 2 R − 1 ) as n → ∞ , d θ n − θ ∗ ) − � n where R is the limit of R n = 1 t =1 ϕ t ϕ T t as n → ∞ (if exists). n Confidence Ellipsoid � � σ 2 θ n ) ≤ µ ˆ θ ∈ R d : ( θ − ˆ θ n ) T R n ( θ − ˆ � n Θ n ,µ � n where P ( θ ∗ ∈ � Θ n ,µ ) ≈ F χ 2 ( d ) ( µ ), where F χ 2 ( d ) is the CDF of χ 2 ( d ), � n 1 t ˆ σ 2 n � ( y t − ϕ T θ n ) 2 , is an estimate of σ 2 . ˆ n − d t =1 Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 6

Reference and Sign-Perturbed Sums Let us introduce a reference sum and m − 1 sign-perturbed sums. Reference Sum � n − 1 S 0 ( θ ) � R ϕ t ( y t − ϕ T 2 t θ ) n t =1 Sign-Perturbed Sums � n − 1 ϕ t α i , t ( y t − ϕ T S i ( θ ) � R 2 t θ ) n t =1 for i = 1 , . . . , m − 1, where α i , t ( t = 1 , . . . , n ) are i.i.d. random signs, that is α i , t = ± 1 with probability 1/2 each (Rademacher). Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 7

Intuitive Idea: Distributional Invariance Recall: { w t } are independent and each w t is symmetric about zero. Observe that, if θ = θ ∗ , we have ( i = 1 , . . . , m − 1) Distributional Invariance � n − 1 S 0 ( θ ∗ ) = R 2 ϕ t w t n t =1 n � − 1 S i ( θ ∗ ) = R 2 ϕ t α i , t w t n t =1 Consider the ordering � S (0) ( θ ∗ ) � 2 ≺ · · · ≺ � S ( m − 1) ( θ ∗ ) � 2 Note: relation “ ≺ ” is the canonical “ < ” with random tie-breaking All orderings are equally probable! (they are conditionally i.i.d.) Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 8

Intuitive Idea: Reference Dominance What if θ � = θ ∗ ? In fact, the reference paraboloid � S 0 ( θ ) � 2 increases faster than {� S i ( θ ) � 2 } , thus will eventually dominate the ordering. θ � θ ∗ − θ Intuitively, for “large enough” � ˜ θ � , where ˜ Eventual Dominance of the Reference Paraboloid � � � � n n n n � � 2 � � 2 � � � � � t ˜ � � t ˜ � ϕ t ϕ T ± ϕ t ϕ T θ + ϕ t w t > θ + ± ϕ t w t � � � � R − 1 R − 1 t =1 t =1 t =1 t =1 n n with “high probability” (for simplicity ± is used instead of { α i , t } ). Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 9

Non-Asymptotic Confidence Regions The rank of � S 0 ( θ ) � 2 in the ordering of {� S i ( θ ) � 2 } w.r.t. ≺ is m − 1 � I ( � S i ( θ ) � 2 ≺ � S 0 ( θ ) � 2 ) , R ( θ ) = 1 + i =1 where I ( · ) is an indicator function. Sign-Perturbed Sums (SPS) Confidence Regions � � θ ∈ R d : R ( θ ) ≤ m − q � Θ n � where m > q > 0 are user-chosen integers (design parameters). Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 10

Exact Confidence (A1) { w t } is a sequence of independent random variables. Each w t has a symmetric probability distribution about zero. (A2) The outer product of regressors is invertible, det( R n ) � = 0. Exact Confidence of SPS � � = 1 − q θ ∗ ∈ � P Θ n m for finite samples. Parameters m and q are under our control. θ n ) � 2 = 0, thus ˆ Note that � S 0 (ˆ θ n ∈ � Θ n , assuming it is non-empty. Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 11

Star Convexity Set X ⊆ R d is star convex if there is a star center c ∈ R d with ∀ x ∈ X , ∀ β ∈ [0 , 1] : β x + (1 − β ) c ∈ X . Star Convexity of SPS � Θ n is star convex with the LSE, ˆ θ n , as a star center Hint � Θ n is the union and intersection of ellipsoids containing LSE. Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 12

Strong Consistency (A1) independence, symmetricity: { w t } are independent, symmetric � n (A2) invertibility: R n � 1 t =1 ϕ t ϕ T t is invertible n (A3) regressor growth rate: � ∞ t =1 � ϕ t � 4 / t 2 < ∞ � � 2 / t 2 < ∞ (A4) noise moment growth rate: � ∞ E [ w 2 t ] t =1 (A5) Ces` aro summability: lim n →∞ R n = R , which is positive definite Strong Consistency of SPS � � � � � ∞ � ∞ � Θ n ⊆ B ε ( θ ∗ ) P = 1 , k =1 n = k where B ε ( θ ∗ ) � { θ ∈ R d : � θ − θ ∗ � ≤ ε } is a norm ball. Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 13

Ellipsoidal Outer Approximation The reference paraboloid can be rewritten as � S 0 ( θ ) � 2 = ( θ − ˆ θ n ) T R n ( θ − ˆ θ n ) . From which an alternative description of the confidence region is � � θ ∈ R d : ( θ − ˆ � θ n ) T R n ( θ − ˆ Θ n ⊆ θ n ) ≤ r ( θ ) , where r ( θ ) is the q th largest value of {� S i ( θ ) � 2 } i � =0 . Ellipsoidal Outer Approximation � θ n ) ≤ r ∗ � θ ∈ R d : ( θ − ˆ � θ n ) T R n ( θ − ˆ Θ n ⊆ Where r ∗ can be efficiently computed by a semi-definite program. Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 14

Undermodelling Assume we are given a (finite) sample of input and output data, { u t } , { y t } , which we model with an FIR system y t ( θ ) � ϕ T � t θ + w t , where ϕ t � [ u t − 1 , . . . , u t − d ] ⊤ The true data generation system t θ ∗ + e t + n t , y t = ϕ ⊤ where e t is an extra component that can depend on all past inputs u t − d − 1 , u t − d − 2 , . . . and on all past noises n t − 1 , n t − 2 . . . . If { e t } are nonzero, then the SPS confidence regions will still (almost surely) shrink, but around a wrong parameter value. Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 15

SPS with Undermodelling Detection UD-SPS is obtained from SPS by replacing { S i ( θ ) } with � R n � − 1 � ϕ t � n 2 1 � B n ( y t − ϕ ⊤ � Q 0 ( θ ) t θ ) , B ⊤ D n n ψ t n t =1 � R n � − 1 � ϕ t � n 2 1 � B n ( y t − ϕ ⊤ � Q i ( θ ) α i , t t θ ) , B ⊤ D n ψ t n n t =1 where ψ t is a vector that includes s extra input values preceding n b that are included in ϕ t , ψ t � [ u t − d − 1 , . . . , u t − d − s ] ⊤ , and the ˆ n n � � B n � 1 D n � 1 ϕ t ψ ⊤ ψ t ψ ⊤ t , t . n n t =1 t =1 Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 16

The Connection of UD-SPS and SPS The connection of UD-SPS and SPS can be stated as Reducing UD-SPS to SPS n , for estimating θ ∗ ∈ R d can be The UD-SPS region, � Θ o interpreted as the restriction to a d -dimensional space of a n , that lives in the domain { θ ′ ∈ R d + s } . standard SPS region, � Θ ′ R d + s is the d -dimensional identification space augmented with s extra components: � Θ o n can be identified with the first d n ∩ ( R d × { 0 } s ). components of the set � Θ ′ Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 17

Undermodelling Detection with Sign-Perturbed Sums e 1,2 Marco Campi - PowerPoint PPT Presentation

Undermodelling Detection with Sign-Perturbed Sums e 1,2 Marco Campi 3 aji 2 Erik Weyer 4 Algo Car` Bal azs Cs 1 Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands 2 Institute for Computer Science and Control (SZTAKI), Hungarian

Dedekind sums ingredients Dedekind sums Fourier- Dedekind sums Restricted partition Mirco

Recap: Prefix Sums Given A : set of n integers Find B : prefix sums A: 3 1 1 7 2 5

Closed-Loop Applicability of the Sign-Perturbed Sums Method aji 1 Erik Weyer 2 Bal azs Csan

Convergence of perturbed Proximal Gradient algorithms Gersende Fort Institut de Math ematiques

Linear Solvers for Singularly Perturbed Problems Numerical Analysis for Singularly Perturbed

The Perturbed The Perturbed Carbon Cycle Carbon Cycle EES 3310/5310 EES 3310/5310 Global

Binary and Ternary Kloosterman sums Kseniya Garaschuk University of Victoria July 22, 2010

Data Structures II Partial Sums Dynamic Arrays Philip Bille Data Structures II

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Learning with Differentiable Perturbed Optimizers Quentin Berthet Youth in High-dimensions -

Learning with Differentiable Perturbed Optimizers Quentin Berthet Optimization for ML - CIRM -

Fast direct solvers for elliptic partial differential equations on locally-perturbed geometries

Nonlinear Control Lecture # 8 Time Varying and Perturbed Systems Nonlinear Control Lecture # 8

Nonlinear Control Lecture # 10 Time Varying and Perturbed Systems Nonlinear Control Lecture #

The Bloch Transform and Scattering from Perturbed Periodic Structures Ruming Zhang, joint with

Nonlinear Control Lecture # 11 Time Varying and Perturbed Systems Nonlinear Control Lecture #

Checklist for Analytical Method Validation (Chemical) TEST ASSAY/RELATED SUBSTANCES PARAMETER

Assessing the geographic resolution of exhaustive tabulation for geolocating Internet hosts S.

Linear Bandits: Rich decision sets Sham M. Kakade Machine Learning for Big Data CSE547/STAT548

The R package nlstools : a toolbox for nonlinear regression Florent Baty Sandrine Charles

How to Make Remaining Problem Plausibility-Based Let Us Consider the . . . How to Modify . . .

Selection Regions Assume we have a set of classifiers D = { D 1, D 2, ..., D L } Let R n be

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Outline Significance Levels (8.2.4) z -Tests (8.2.5) ests (8 5) Summary (8.3) 1