The multiresolution criterion and nonparametric regression Thoralf - PowerPoint PPT Presentation

The multiresolution criterion and nonparametric regression Thoralf Mildenberger and Henrike Weinert joint work with P .L. Davies and U. Gather SFB 475 Fakultät Statistik Technische Universität Dortmund Workshop on current trends and challenges in model selection and related areas Vienna, July 2008

Outline Nonparametric Regression Choosing the smoothing parameter Simulation Study The multiresolution norm Geometric Interpretation The MR-norm and ℓ p -Norms

Nonparametric Regression Model: y ( t i ) = f ( t i ) + ε ( t i ) , ( 0 ≤ t 1 < · · · < t N ≤ 1 ) ε ( t 1 ) , . . . , ε ( t N ) iid ∼ N ( 0 , σ 2 ) Goal: Find estimate ˆ f of f .

Nonparametric Regression Model: y ( t i ) = f ( t i ) + ε ( t i ) , ( 0 ≤ t 1 < · · · < t N ≤ 1 ) ε ( t 1 ) , . . . , ε ( t N ) iid ∼ N ( 0 , σ 2 ) Goal: Find estimate ˆ f of f . Problem: ˆ f usually chosen from family (ˆ f h ) indexed by smoothing parameter h (bandwidth, size of a partition, penalty etc.) Interpretation: Often h - ‘complexity’ of ˆ f h .

Choosing the smoothing parameter Risk based choice: h such that ˆ f h minimizes risk (e.g. MSE, MISE etc.) Risk has to be estimated from data by e.g.: Asymptotic considerations, Plug-In-Methods, Penalized Criteria, CV, Risk bounds etc. Residual based choice: Given data, find simplest model that ’could have generated’ the data, i.e. residuals ’look like noise’ e.g. Taut-String Algorithm (Davies and Kovac 2001) .

The Multiresolution Criterion Given some estimate ˆ f , consider residuals r i := r ( t i ) := y ( t i ) − ˆ f ( t i ) Accept residuals as noise iff � � 1 � � � max r i � ≤ σ C ( ∗ ) � � � � � | I | I ∈I � i ∈ I I System of all intervals in { 1 , . . . , N }

The Multiresolution Criterion Given some estimate ˆ f , consider residuals r i := r ( t i ) := y ( t i ) − ˆ f ( t i ) Accept residuals as noise iff � � 1 � � � max r i � ≤ σ C ( ∗ ) � � � � � | I | I ∈I � i ∈ I I System of all intervals in { 1 , . . . , N } Choose estimate of smallest complexity such that ( ∗ ) is fulfilled.

Residual based methods MR criterion has been combined with different measures of complexity : ◮ Number of local extrema or total variation (Taut-String-Algorithm, Davies and Kovac 2001) ◮ Number of changes between convexity and concavity (Davies, Kovac and Meise 2008) ◮ Smoothness quantified by derivatives (Weighted Smoothing Splines, Davies and Meise 2008) ◮ Number of jumps (Potts smoother, Boysen et al. 2008)

Taut String Method n = 1 summed process y ◦ � t i ≤ t y ( t i ) n � � n , C y ◦ Tube T : √ n n − C n + C y ◦ √ n ≤ g ( t ) ≤ y ◦ √ n � 1 � 1 + s 2 String S n : has smallest length ( S n ) = n ( t ) dt 0 Derivative of S n : candidate for ˆ f Check if MR criterion fulfilled, if not: local squeezing of tube

Simulation Study (Davies, Gather, Weinert, 2008) ◮ Wavelet-Thresholding (Donoho and Johnstone, 1994) → hard and soft thresholding [H,S] ◮ Unbalanced Haar (Fryzlewicz, 2006) [U] ◮ Minimum-Description-Length (Rissanen, 2000) [M] ◮ Adaptive weights smoothing (Polzehl and Spokoiny, 2003) [A] ◮ Local Plug-in kernel method (Herrmann, 1997) [P] ◮ Taut-string (Davies and Kovac, 2001) [T,V]

Simulation Study Doppler Bumps Heavisine 10 10 5 40 5 0 0 f f f 20 −5 −10 −15 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 t t t Blocks Sine Constant Signal 1.0 1.0 15 0.0 0.0 f 5 f f −5 −1.0 −1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 t t t

Simulation Study 6 Test-bed functions, 4 σ -values, 5 sample sizes n 1000 simulations at each test-bed function, σ − and n − level

Simulation Study 6 Test-bed functions, 4 σ -values, 5 sample sizes n 1000 simulations at each test-bed function, σ − and n − level Mean for 3 performance criteria: � i � i � �� ℓ ( f , ˆ − ˆ � L ∞ -norm: f ) = max 1 ≤ i ≤ n � f f � � n n � �� 2 � i � i � ℓ ( f , ˆ − ˆ f ) = 1 � n � L 2 -norm: f f i = 1 n n n

Simulation Study 6 Test-bed functions, 4 σ -values, 5 sample sizes n 1000 simulations at each test-bed function, σ − and n − level Mean for 3 performance criteria: � i � i � �� ℓ ( f , ˆ − ˆ � L ∞ -norm: f ) = max 1 ≤ i ≤ n � f f � � n n � �� 2 � i � i � ℓ ( f , ˆ − ˆ f ) = 1 � n � L 2 -norm: f f i = 1 n n n Peak-identification-loss: ℓ ( f , ˆ f ) = number of unidentified extremes of f + number of superfluous extremes of ˆ f → overall error in identifying extremes of true f with extremes of ˆ f

Approximations of Doppler-data Wavelet (hard) Unbalanced Haar MDL 15 15 15 doppler-data doppler-data doppler-data 5 5 5 0 0 0 -5 -5 -5 -15 -15 -15 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 t (n=1024) t (n=1024) t (n=1024) Kernel Plug-in AWS Taut String 15 15 15 doppler-data doppler-data doppler-data 5 5 5 0 0 0 -5 -5 -5 -15 -15 -15 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 t (n=1024) t (n=1024) t (n=1024)

Approximations of Blocks-data Wavelet (hard) Unbalanced Haar MDL 15 15 15 blocks-data blocks-data blocks-data 5 5 5 0 0 0 -10 -10 -10 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 t (n=1024) t (n=1024) t (n=1024) Kernel Plug-in AWS Taut String 15 15 15 blocks-data blocks-data blocks-data 5 5 5 0 0 0 -10 -10 -10 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 t (n=1024) t (n=1024) t (n=1024)

Approximations of a Constant Wavelet (hard) Unbalanced Haar MDL 2 2 2 1 1 1 noise noise noise 0 0 0 -1 -1 -1 -3 -3 -3 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 t (n=1024) t (n=1024) t (n=1024) Kernel Plug-in AWS Taut String 2 2 2 1 1 1 noise noise noise 0 0 0 -1 -1 -1 -3 -3 -3 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 t (n=1024) t (n=1024) t (n=1024)

Average Ranks Average rank 5 L ∞ -norm 4 3 W H W S U H MD L P L AW S TS TV H S U M P A T V 6 Average rank 5 L 2 -norm 4 3 W H W S U H MD L P L AW S TS TV H S U M P A T V 7 Average rank PID 5 3 W H W S U H MD L P L AW S TS TV H S U M P A T V

Average Ranks Average rank 5 L ∞ -norm 4 3 W H W S U H MD L P L AW S TS TV H S U M P A T V 6 Average rank 5 L 2 -norm 4 3 W H W S U H MD L P L AW S TS TV H S U M P A T V 7 Average rank PID 5 3 W H W S U H MD L P L AW S TS TV H S U M P A T V MR-based TS algorithm performs well

MR criterion and Nadaraya-Watson kernel regression � n �� n  i = 1 K h ( t i − t ) r i √ � n i = 1 K 2 h ( t i − t ) , h ( t i − t ) � = 0 if  i = 1 K 2 r t , h := �� n i = 1 K 2 0 , if h ( t i − t ) = 0  for all t ∈ [ 0 , 1 ] , h > 0, with K h ( · ) := h − 1 K ( h − 1 · ) for the uniform kernel K := I [ − 0 . 5 , 0 . 5 ]

MR criterion and Nadaraya-Watson kernel regression � n �� n  i = 1 K h ( t i − t ) r i √ � n i = 1 K 2 h ( t i − t ) , h ( t i − t ) � = 0 if  i = 1 K 2 r t , h := �� n i = 1 K 2 0 , if h ( t i − t ) = 0  for all t ∈ [ 0 , 1 ] , h > 0, with K h ( · ) := h − 1 K ( h − 1 · ) for the uniform kernel K := I [ − 0 . 5 , 0 . 5 ] Then: iid ◮ r 1 , . . . , r N ∼ N ( 0 , σ 2 ) = ⇒ r t , h ∼N ( 0 , σ 2 ) . ◮ MR criterion: � � 1 � � � sup | r t , h | = max r i � � � � � | I | I ∈I t , h � � i ∈ I

The Multiresolution Norm (Mildenberger 2008) Consider: data ( y 1 , . . . , y N ) (ˆ f 1 , . . . , ˆ estimate f N ) residuals ( r 1 , . . . , r N ) as vectors in R N with the multiresolution norm � � 1 � � � � ( x 1 , . . . , x N ) � MR := max x t � � � � � | I | I ∈I � � t ∈ I

The Multiresolution Norm (Mildenberger 2008) Consider: data ( y 1 , . . . , y N ) (ˆ f 1 , . . . , ˆ estimate f N ) residuals ( r 1 , . . . , r N ) as vectors in R N with the multiresolution norm � � 1 � � � � ( x 1 , . . . , x N ) � MR := max x t � � � � � | I | I ∈I � � t ∈ I Then: Multiresolution criterion is fulfilled ⇒ � y − ˆ ⇐ f � MR ≤ σ C i.e. ˆ f is contained in the MR-Ball of radius σ C centered at y or (equivalently) residuals r = y − ˆ f lie in ball around zero

Multiresolution Norm Unit Ball in R 2 1.5 1 0.5 0 –1.5 –1 –0.5 0.5 1 1.5 –0.5 –1 –1.5

ℓ p -Norms t = 1 | x t | p � 1 / p �� N � ( x 1 , . . . , x N ) � p = ( 1 ≤ p < ∞ ) � ( x 1 , . . . , x N ) � ∞ = max {| x 1 | , . . . , | x N |}

ℓ p -Norms t = 1 | x t | p � 1 / p �� N � ( x 1 , . . . , x N ) � p = ( 1 ≤ p < ∞ ) � ( x 1 , . . . , x N ) � ∞ = max {| x 1 | , . . . , | x N |} invariant w.r.t.:

The multiresolution criterion and nonparametric regression Thoralf - PowerPoint PPT Presentation

The multiresolution criterion and nonparametric regression Thoralf Mildenberger and Henrike Weinert joint work with P .L. Davies and U. Gather SFB 475 Fakultt Statistik Technische Universitt Dortmund Workshop on current trends and

NEW CRITERIA LABELS Criterion 1. Students Criterion 2. Program Educational Objectives Criterion

Multiresolution Modeling A Very Brief Introduction 1 Spring 2010 Multiresolution

Multiresolution Analysis (MRA) WTBV January 10, 2017 WTBV Multiresolution Analysis (MRA)

Wavelets and Multiresolution Processing Thinh Nguyen Multiresolution Analysis (MRA) Analysis

Nonparametric Regression Splines for Nonparametric Regression Splines for Regional Atmospheric

Concepts and Algorithms of Scientific and Visual Computing Multiresolution Analysis CS448J,

Introduction to Nonparametric Bayesian Modeling and Gaussian Process Regression Piyush Rai Dept.

Orthogonal Wavelets and Homework February 23 Properties of multiresolution subspaces V j

Multiresolution Cluster AnalysisAddressing Trust in Climate Classifications Derek DeSantis

multiresolution analysis for the statistical analysis of incomplete rankings Eric Sibony Anna

Multiresolution analysis & wavelets (quick tutorial) Application : image modeling Andr

A Multiresolution Stochastic Process Model for Basketball Possession Outcomes Dan Cervone, Alex

A class of anisotropic multiple multiresolution analysis Mariantonia Cotronei University of

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Von Mises Failure Criterion Von Mises Criterion . . . in Mechanics of Materials: Computing V

Plan of the Lecture Review: Nyquist stability criterion Todays topic: Nyquist stability

Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 / 18 Consistency of

Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Density Estimation

CS489/698 Lecture 9: Feb 1, 2017 Multi-layer Neural Networks, Error Backpropagation [D] Chapt.

Fitting Nonlinear Models to Data SI Model The SI model we discussed before is often written dS

Non-Parametric Methods and Support Vector Machines Shan-Hung Wu shwu@cs.nthu.edu.tw Department

Nonparametric inference of interaction laws in particle/agent systems Fei Lu Department of

Explainable(?) Statistical ML Derek Doran Dept. of Computer Science and Engineering Wright

Regression: Simple and Linear Introduction to Machine Learning Regression Principle REGRESSION

The multiresolution criterion and nonparametric regression Thoralf - PowerPoint PPT Presentation

The multiresolution criterion and nonparametric regression Thoralf Mildenberger and Henrike Weinert joint work with P .L. Davies and U. Gather SFB 475 Fakultt Statistik Technische Universitt Dortmund Workshop on current trends and

NEW CRITERIA LABELS Criterion 1. Students Criterion 2. Program Educational Objectives Criterion

Multiresolution Modeling A Very Brief Introduction 1 Spring 2010 Multiresolution

Multiresolution Analysis (MRA) WTBV January 10, 2017 WTBV Multiresolution Analysis (MRA)

Wavelets and Multiresolution Processing Thinh Nguyen Multiresolution Analysis (MRA) Analysis

Nonparametric Regression Splines for Nonparametric Regression Splines for Regional Atmospheric

Concepts and Algorithms of Scientific and Visual Computing Multiresolution Analysis CS448J,

Introduction to Nonparametric Bayesian Modeling and Gaussian Process Regression Piyush Rai Dept.

Orthogonal Wavelets and Homework February 23 Properties of multiresolution subspaces V j

Multiresolution Cluster AnalysisAddressing Trust in Climate Classifications Derek DeSantis

multiresolution analysis for the statistical analysis of incomplete rankings Eric Sibony Anna

Multiresolution analysis &amp; wavelets (quick tutorial) Application : image modeling Andr

A Multiresolution Stochastic Process Model for Basketball Possession Outcomes Dan Cervone, Alex

A class of anisotropic multiple multiresolution analysis Mariantonia Cotronei University of

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Von Mises Failure Criterion Von Mises Criterion . . . in Mechanics of Materials: Computing V

Plan of the Lecture Review: Nyquist stability criterion Todays topic: Nyquist stability

Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 / 18 Consistency of

Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Density Estimation

CS489/698 Lecture 9: Feb 1, 2017 Multi-layer Neural Networks, Error Backpropagation [D] Chapt.

Fitting Nonlinear Models to Data SI Model The SI model we discussed before is often written dS

Non-Parametric Methods and Support Vector Machines Shan-Hung Wu shwu@cs.nthu.edu.tw Department

Nonparametric inference of interaction laws in particle/agent systems Fei Lu Department of

Explainable(?) Statistical ML Derek Doran Dept. of Computer Science and Engineering Wright

Regression: Simple and Linear Introduction to Machine Learning Regression Principle REGRESSION

Multiresolution analysis & wavelets (quick tutorial) Application : image modeling Andr