The multiresolution criterion and nonparametric regression Thoralf - - PowerPoint PPT Presentation

the multiresolution criterion and nonparametric regression
SMART_READER_LITE
LIVE PREVIEW

The multiresolution criterion and nonparametric regression Thoralf - - PowerPoint PPT Presentation

The multiresolution criterion and nonparametric regression Thoralf Mildenberger and Henrike Weinert joint work with P .L. Davies and U. Gather SFB 475 Fakultt Statistik Technische Universitt Dortmund Workshop on current trends and


slide-1
SLIDE 1

The multiresolution criterion and nonparametric regression

Thoralf Mildenberger and Henrike Weinert

joint work with P .L. Davies and U. Gather

SFB 475 Fakultät Statistik Technische Universität Dortmund

Workshop on current trends and challenges in model selection and related areas Vienna, July 2008

slide-2
SLIDE 2

Outline

Nonparametric Regression Choosing the smoothing parameter Simulation Study The multiresolution norm Geometric Interpretation The MR-norm and ℓp-Norms

slide-3
SLIDE 3

Nonparametric Regression

Model: y(ti) = f(ti) + ε(ti), (0 ≤ t1 < · · · < tN ≤ 1) ε(t1), . . . , ε(tN) iid ∼ N(0, σ2) Goal: Find estimate ˆ f of f.

slide-4
SLIDE 4

Nonparametric Regression

Model: y(ti) = f(ti) + ε(ti), (0 ≤ t1 < · · · < tN ≤ 1) ε(t1), . . . , ε(tN) iid ∼ N(0, σ2) Goal: Find estimate ˆ f of f. Problem: ˆ f usually chosen from family (ˆ fh) indexed by smoothing parameter h (bandwidth, size of a partition, penalty etc.) Interpretation: Often h - ‘complexity’ of ˆ fh.

slide-5
SLIDE 5

Choosing the smoothing parameter

Risk based choice: h such that ˆ fh minimizes risk (e.g. MSE, MISE etc.) Risk has to be estimated from data by e.g.: Asymptotic considerations, Plug-In-Methods, Penalized Criteria, CV, Risk bounds etc. Residual based choice: Given data, find simplest model that ’could have generated’ the data, i.e. residuals ’look like noise’ e.g. Taut-String Algorithm (Davies and Kovac 2001).

slide-6
SLIDE 6

The Multiresolution Criterion

Given some estimate ˆ f, consider residuals ri := r(ti) := y(ti) − ˆ f(ti) Accept residuals as noise iff max

I∈I

1

  • |I|
  • i∈I

ri

  • ≤ σC

(∗) I System of all intervals in {1, . . . , N}

slide-7
SLIDE 7

The Multiresolution Criterion

Given some estimate ˆ f, consider residuals ri := r(ti) := y(ti) − ˆ f(ti) Accept residuals as noise iff max

I∈I

1

  • |I|
  • i∈I

ri

  • ≤ σC

(∗) I System of all intervals in {1, . . . , N} Choose estimate of smallest complexity such that (∗) is fulfilled.

slide-8
SLIDE 8

Residual based methods

MR criterion has been combined with different measures of complexity:

◮ Number of local extrema or total variation

(Taut-String-Algorithm, Davies and Kovac 2001)

◮ Number of changes between convexity and concavity

(Davies, Kovac and Meise 2008)

◮ Smoothness quantified by derivatives

(Weighted Smoothing Splines, Davies and Meise 2008)

◮ Number of jumps

(Potts smoother, Boysen et al. 2008)

slide-9
SLIDE 9

Taut String Method

summed process y◦

n = 1 n

  • ti≤t y(ti)

Tube T

  • y◦

n, C √n

  • :

y◦

n − C √n ≤ g(t) ≤ y◦ n + C √n

String Sn: has smallest length(Sn) = 1

  • 1 + s2

n(t)dt

Derivative of Sn: candidate for ˆ f Check if MR criterion fulfilled, if not: local squeezing of tube

slide-10
SLIDE 10

Simulation Study (Davies, Gather, Weinert, 2008)

◮ Wavelet-Thresholding (Donoho and Johnstone, 1994)

→ hard and soft thresholding [H,S]

◮ Unbalanced Haar (Fryzlewicz, 2006)

[U]

◮ Minimum-Description-Length (Rissanen, 2000)

[M]

◮ Adaptive weights smoothing

(Polzehl and Spokoiny, 2003) [A]

◮ Local Plug-in kernel method (Herrmann, 1997)

[P]

◮ Taut-string (Davies and Kovac, 2001)

[T,V]

slide-11
SLIDE 11

Simulation Study

0.0 0.2 0.4 0.6 0.8 1.0 −10 5 10 t f

Doppler

0.0 0.2 0.4 0.6 0.8 1.0 20 40 t f

Bumps

0.0 0.2 0.4 0.6 0.8 1.0 −15 −5 5 10 t f

Heavisine

0.0 0.2 0.4 0.6 0.8 1.0 −5 5 15 t f

Blocks

0.0 0.2 0.4 0.6 0.8 1.0 −1.0 0.0 1.0 t f

Sine

0.0 0.2 0.4 0.6 0.8 1.0 −1.0 0.0 1.0 t f

Constant Signal

slide-12
SLIDE 12

Simulation Study

6 Test-bed functions, 4 σ-values, 5 sample sizes n 1000 simulations at each test-bed function, σ− and n−level

slide-13
SLIDE 13

Simulation Study

6 Test-bed functions, 4 σ-values, 5 sample sizes n 1000 simulations at each test-bed function, σ− and n−level Mean for 3 performance criteria: L∞-norm: ℓ(f,ˆ f) = max1≤i≤n

  • f

i

n

  • − ˆ

f i

n

  • L2-norm:

ℓ(f,ˆ f) = 1

n

n

i=1

  • f

i

n

  • − ˆ

f i

n

2

slide-14
SLIDE 14

Simulation Study

6 Test-bed functions, 4 σ-values, 5 sample sizes n 1000 simulations at each test-bed function, σ− and n−level Mean for 3 performance criteria: L∞-norm: ℓ(f,ˆ f) = max1≤i≤n

  • f

i

n

  • − ˆ

f i

n

  • L2-norm:

ℓ(f,ˆ f) = 1

n

n

i=1

  • f

i

n

  • − ˆ

f i

n

2 Peak-identification-loss: ℓ(f,ˆ f) = number of unidentified extremes of f + number of superfluous extremes of ˆ f → overall error in identifying extremes of true f with extremes of ˆ f

slide-15
SLIDE 15

Approximations of Doppler-data

Taut String Unbalanced Haar MDL Wavelet (hard) AWS Kernel Plug-in

0.0 0.2 0.4 0.6 0.8 1.0

  • 15
  • 5

5 15 t (n=1024) doppler-data 0.0 0.2 0.4 0.6 0.8 1.0

  • 15
  • 5

5 15 t (n=1024) doppler-data 0.0 0.2 0.4 0.6 0.8 1.0

  • 15
  • 5

5 15 t (n=1024) doppler-data 0.0 0.2 0.4 0.6 0.8 1.0

  • 15
  • 5

5 15 t (n=1024) doppler-data 0.0 0.2 0.4 0.6 0.8 1.0

  • 15
  • 5

5 15 t (n=1024) doppler-data 0.0 0.2 0.4 0.6 0.8 1.0

  • 15
  • 5

5 15 t (n=1024) doppler-data

slide-16
SLIDE 16

Approximations of Blocks-data

0.0 0.2 0.4 0.6 0.8 1.0

  • 10

5 15 t (n=1024) blocks-data 0.0 0.2 0.4 0.6 0.8 1.0

  • 10

5 15 t (n=1024) blocks-data 0.0 0.2 0.4 0.6 0.8 1.0

  • 10

5 15 t (n=1024) blocks-data 0.0 0.2 0.4 0.6 0.8 1.0

  • 10

5 15 t (n=1024) blocks-data 0.0 0.2 0.4 0.6 0.8 1.0

  • 10

5 15 t (n=1024) blocks-data 0.0 0.2 0.4 0.6 0.8 1.0

  • 10

5 15 t (n=1024) blocks-data

Taut String Unbalanced Haar MDL Wavelet (hard) AWS Kernel Plug-in

slide-17
SLIDE 17

Approximations of a Constant

0.0 0.2 0.4 0.6 0.8 1.0

  • 3
  • 1

1 2 t (n=1024) noise 0.0 0.2 0.4 0.6 0.8 1.0

  • 3
  • 1

1 2 t (n=1024) noise 0.0 0.2 0.4 0.6 0.8 1.0

  • 3
  • 1

1 2 t (n=1024) noise 0.0 0.2 0.4 0.6 0.8 1.0

  • 3
  • 1

1 2 t (n=1024) noise 0.0 0.2 0.4 0.6 0.8 1.0

  • 3
  • 1

1 2 t (n=1024) noise 0.0 0.2 0.4 0.6 0.8 1.0

  • 3
  • 1

1 2 t (n=1024) noise

Taut String Unbalanced Haar MDL Wavelet (hard) AWS Kernel Plug-in

slide-18
SLIDE 18

Average Ranks

Average rank 3 5 7 W H W S U H MD L P L AW S TS TV Average rank 3 4 5 6 W H W S U H MD L P L AW S TS TV Average rank 3 4 5 W H W S U H MD L P L AW S TS TV

L2-norm L∞-norm PID

H S U M P A T V H S U M P A T V H S U M P A T V

slide-19
SLIDE 19

Average Ranks

Average rank 3 5 7 W H W S U H MD L P L AW S TS TV Average rank 3 4 5 6 W H W S U H MD L P L AW S TS TV Average rank 3 4 5 W H W S U H MD L P L AW S TS TV

L2-norm L∞-norm PID

H S U M P A T V H S U M P A T V H S U M P A T V

MR-based TS algorithm performs well

slide-20
SLIDE 20

MR criterion and Nadaraya-Watson kernel regression

rt,h :=   

n

i=1 Kh(ti−t)ri

√n

i=1 K 2 h (ti−t),

if n

i=1 K 2 h (ti − t) = 0

0, if n

i=1 K 2 h (ti − t) = 0

for all t ∈ [0, 1], h > 0, with Kh(·) := h−1K(h−1·) for the uniform kernel K := I[−0.5,0.5]

slide-21
SLIDE 21

MR criterion and Nadaraya-Watson kernel regression

rt,h :=   

n

i=1 Kh(ti−t)ri

√n

i=1 K 2 h (ti−t),

if n

i=1 K 2 h (ti − t) = 0

0, if n

i=1 K 2 h (ti − t) = 0

for all t ∈ [0, 1], h > 0, with Kh(·) := h−1K(h−1·) for the uniform kernel K := I[−0.5,0.5] Then:

◮ r1, . . . , rN iid

∼ N(0, σ2) = ⇒ rt,h∼N(0, σ2).

◮ MR criterion:

sup

t,h

|rt,h| = max

I∈I

1

  • |I|
  • i∈I

ri

slide-22
SLIDE 22

The Multiresolution Norm (Mildenberger 2008)

Consider: data (y1, . . . , yN) estimate (ˆ f1, . . . ,ˆ fN) residuals (r1, . . . , rN) as vectors in RN with the multiresolution norm (x1, . . . , xN)MR := max

I∈I

1

  • |I|
  • t∈I

xt

slide-23
SLIDE 23

The Multiresolution Norm (Mildenberger 2008)

Consider: data (y1, . . . , yN) estimate (ˆ f1, . . . ,ˆ fN) residuals (r1, . . . , rN) as vectors in RN with the multiresolution norm (x1, . . . , xN)MR := max

I∈I

1

  • |I|
  • t∈I

xt

  • Then: Multiresolution criterion is fulfilled

⇐ ⇒ y − ˆ fMR ≤ σC i.e. ˆ f is contained in the MR-Ball of radius σC centered at y or (equivalently) residuals r = y − ˆ f lie in ball around zero

slide-24
SLIDE 24

Multiresolution Norm Unit Ball in R2

–1.5 –1 –0.5 0.5 1 1.5 –1.5 –1 –0.5 0.5 1 1.5

slide-25
SLIDE 25

ℓp-Norms

(x1, . . . , xN)p = N

t=1 |xt|p1/p

(1 ≤ p < ∞) (x1, . . . , xN)∞ = max{|x1|, . . . , |xN|}

slide-26
SLIDE 26

ℓp-Norms

(x1, . . . , xN)p = N

t=1 |xt|p1/p

(1 ≤ p < ∞) (x1, . . . , xN)∞ = max{|x1|, . . . , |xN|} invariant w.r.t.:

slide-27
SLIDE 27

ℓp-Norms

(x1, . . . , xN)p = N

t=1 |xt|p1/p

(1 ≤ p < ∞) (x1, . . . , xN)∞ = max{|x1|, . . . , |xN|} invariant w.r.t.:

  • 1. Sign changes in one or several components
slide-28
SLIDE 28

ℓp-Norms

(x1, . . . , xN)p = N

t=1 |xt|p1/p

(1 ≤ p < ∞) (x1, . . . , xN)∞ = max{|x1|, . . . , |xN|} invariant w.r.t.:

  • 1. Sign changes in one or several components
  • 2. Permutation of components
slide-29
SLIDE 29

Lack of Invariance

MR-norm not invariant w.r.t. these transformations: Consider (1, −1, 1)MR = max

  • 1, 1, 1,
slide-30
SLIDE 30

Lack of Invariance

MR-norm not invariant w.r.t. these transformations: Consider (1, −1, 1)MR = max

  • 1, 1, 1, 0/

√ 2, 0/ √ 2,

slide-31
SLIDE 31

Lack of Invariance

MR-norm not invariant w.r.t. these transformations: Consider (1, −1, 1)MR = max

  • 1, 1, 1, 0/

√ 2, 0/ √ 2, 1/ √ 3

slide-32
SLIDE 32

Lack of Invariance

MR-norm not invariant w.r.t. these transformations: Consider (1, −1, 1)MR = max

  • 1, 1, 1, 0/

√ 2, 0/ √ 2, 1/ √ 3

  • = 1
slide-33
SLIDE 33

Lack of Invariance

MR-norm not invariant w.r.t. these transformations: Consider (1, −1, 1)MR = max

  • 1, 1, 1, 0/

√ 2, 0/ √ 2, 1/ √ 3

  • = 1

but (1, 1, −1)MR = max

  • 1, 1, 1,
slide-34
SLIDE 34

Lack of Invariance

MR-norm not invariant w.r.t. these transformations: Consider (1, −1, 1)MR = max

  • 1, 1, 1, 0/

√ 2, 0/ √ 2, 1/ √ 3

  • = 1

but (1, 1, −1)MR = max

  • 1, 1, 1, 2/

√ 2, 0/ √ 2,

slide-35
SLIDE 35

Lack of Invariance

MR-norm not invariant w.r.t. these transformations: Consider (1, −1, 1)MR = max

  • 1, 1, 1, 0/

√ 2, 0/ √ 2, 1/ √ 3

  • = 1

but (1, 1, −1)MR = max

  • 1, 1, 1, 2/

√ 2, 0/ √ 2, 1/ √ 3

slide-36
SLIDE 36

Lack of Invariance

MR-norm not invariant w.r.t. these transformations: Consider (1, −1, 1)MR = max

  • 1, 1, 1, 0/

√ 2, 0/ √ 2, 1/ √ 3

  • = 1

but (1, 1, −1)MR = max

  • 1, 1, 1, 2/

√ 2, 0/ √ 2, 1/ √ 3

  • =

√ 2

slide-37
SLIDE 37

Lack of Invariance

MR-norm not invariant w.r.t. these transformations: Consider (1, −1, 1)MR = max

  • 1, 1, 1, 0/

√ 2, 0/ √ 2, 1/ √ 3

  • = 1

but (1, 1, −1)MR = max

  • 1, 1, 1, 2/

√ 2, 0/ √ 2, 1/ √ 3

  • =

√ 2 With |x| := (|x1|, . . . , |xN|), we have: xMR ≤ |x|MR

slide-38
SLIDE 38

Lack of Invariance

Furthermore:

◮ Identity and reverse ordering are the only permutations

that do not affect the MR-norm of any x ∈ RN.

◮ Identity and changing all signs simultaneously are the

  • nly sign changes that do not affect the MR-norm of any

x ∈ RN.

slide-39
SLIDE 39

Sign Patterns

For x ∈ RN mit |x1| = · · · = |xN| =: m > 0:

◮ xMR attains its maximum ⇐

⇒ all components have the same sign

◮ xMR attains its minimum ⇐

⇒ the signs are alternating

◮ xMR ≥ m ×

  • length of longest run
slide-40
SLIDE 40

Sign Patterns

For x ∈ RN mit |x1| = · · · = |xN| =: m > 0:

◮ xMR attains its maximum ⇐

⇒ all components have the same sign

◮ xMR attains its minimum ⇐

⇒ the signs are alternating

◮ xMR ≥ m ×

  • length of longest run

→ Dependence of the MR-norm on sign patterns allows for residual diagnostics!

slide-41
SLIDE 41

Summary

◮ Residual-based smoothing parameter selection performs

quite well

◮ Multiresolution criterion corresponds to

a ball in the multiresolution norm

◮ Detection of structure in residuals is possible because of

lack of invariance properties

slide-42
SLIDE 42

References 1

BOYSEN, L., KEMPE, A., MUNK, A., LIEBSCHER, V. and WITTICH, O. (2008). Consistencies and rates of convergence of jump penalized least squares estimators. The Annals of Statistics, to appear. CHAUDHURI, P. and MARRON, J.S. (2000). Scale space view of curve

  • estimation. Annals of Statistics 28, 408-428.

DAVIES, P.L. and KOVAC, A. (2001). Local Extremes, Runs, Strings and

  • Multiresolution. The Annals of Statistics 29, 1-65.

DAVIES, P. L., KOVAC, A., and MEISE, M. (2008). Nonparametric regression, confidence regions and regularization. The Annals of Statistics, to appear. DAVIES, P.L. and MEISE, M. (2008). Approximating Data with Weighted Smoothing Splines. Journal of Nonparametric Statistics 20, 207-228. DAVIES, P. L., GATHER, U., and WEINERT, H. (2008). Nonparametric Regression as an Example of Model Choice. Communications in Statistics - Simulation and Computation 37, 274-289. DONOHO, D.L., and JOHNSTONE, I. M. (1994).Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425-455.

slide-43
SLIDE 43

References 2

DÜMBGEN, L. and SPOKOINY, V.G. (2001). Multiscale testing of qualitative

  • hypotheses. Annals of Statistics 29, 124-152.

FRYZLEWICZ, P. (2007). Unbalanced Haar Technique for Nonparametric Function Estimation. Journal of the American Statistical Association 102, 1318-1327. HERRMANN, E. (1997). Local Bandwidth Choice in Kernel Regression

  • Estimation. Journal comp. graph. stat. 6, 35-54.

MILDENBERGER, T. (2008). A geometric interpretation of the multiresolution criterion in nonparametric regression. Journal of Nonparametric Statistics, to appear. POLZEHL, J. and SPOKOINY, V. (2003). Varying coefficient regression

  • modeling. Preprint.

RISSANEN, J. (2000). MDL-Denoising. IEEE Trans. Information Theory 42, 40-47.