Lecture 11: Nonparametric Regression (3) Confidence Bands Applied - - PowerPoint PPT Presentation

lecture 11 nonparametric regression 3 confidence bands
SMART_READER_LITE
LIVE PREVIEW

Lecture 11: Nonparametric Regression (3) Confidence Bands Applied - - PowerPoint PPT Presentation

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment Lecture 11: Nonparametric Regression (3) Confidence Bands Applied Statistics 2015 1 / 21 Estimation of Variance Pointwise Confidence Intervals


slide-1
SLIDE 1

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

Lecture 11: Nonparametric Regression (3) Confidence Bands

Applied Statistics 2015

1 / 21

slide-2
SLIDE 2

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

An example from Lecture 9: Pick-It Lottery

  • 200

400 600 800 1000 200 400 600 800 Number Payoff

As suggested by the Nadaraya-Watson estimate, numbers less than 100 have larger payoffs and numbers in [200, 300] have smaller payoffs. Question: Are these patterns real or just by chance? How much random variability is there in the curve? A Confidence band can be used to answer questions of this type.

2 / 21

slide-3
SLIDE 3

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

It is customary to consider fix design for constructing variance esti- mation and confidence bands. Let ˆ r(x) = n

i=1 li(x)Yi be a linear smoother. Consider ˆ

r as Nadaraya- Watson estimator or local linear estimator. Let σ2 = Var(ǫi). Recall that Yi = r(xi) + ǫi. We aim to derive an estimator of σ2. The estimator of σ2 will be needed for building up pointwise and simultaneous confidence bands for r.

3 / 21

slide-4
SLIDE 4

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

An Estimator of σ2

Observe that, σ2 = E

  • ǫ2

i

  • = E(Yi − r(xi))2

Sample mean 1

n

n

i=1 (Yi − r(xi))2 would serve as an estimator for

σ2. However, r and hence r(xi) are unknown. We plug in ˆ r(xi) and change the normalizing factor 1

  • n. The estimator

is given by, ˆ σ2 = n

i=1 (Yi − ˆ

r(xi))2 n − 2ν + ˜ ν , where ν = n

i=1 li(xi) and ˜

ν = n

i=1

n

j=1 l2 i (xj). Or, let L be a

matrix with entry lj(xi) at i-th row and j-th columne. Then ν = trace(L) and ˜ ν = trace(LT L).

4 / 21

slide-5
SLIDE 5

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

An Estimator of σ2

Theorem 5.85 in Wasserman(2005)

Let ˆ rn(x) be a linear smoother. If r is sufficiently smooth, ν = o(n) and ˜ ν = o(n), then ˆ σ2 =

n

i=1(Yi−ˆ

r(xi))2 n−2ν+˜ ν

, is a consistent estimator of σ2. It can be shown that E

  • ˆ

σ2 = σ2 + O

  • 1

n − 2ν + ˜ ν

  • → σ2,

as n → ∞; see page 86 in Wasserman(2005).

5 / 21

slide-6
SLIDE 6

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

Another Estimator of σ2

Another estimator of σ2 is due to Rice (1984). Suppose that xi’s are

  • rdered: x1 ≤ x2 ≤ · · · ≤ xn. Define

ˆ σ2

r =

1 2(n − 1)

n−1

  • i=1

(Yi+1 − Yi)2. The motivation for this estimator is as follows. Assuming that r(x) is smooth and (xi+1 − xi) is sufficiently small, we have r(xi+1) − r(xi) ≈ 0 and hence Yi+1 − Yi ≈ ǫi+1 − ǫi. Further, E(Yi+1 − Yi)2 ≈ E(ǫi+1)2 + E(ǫi)2 = 2σ2. So, σ2 ≈ 1

2E(Yi+1 − Yi)2. The estimator is the corresponding

sample counterpart.

6 / 21

slide-7
SLIDE 7

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

An example

The red points represent the data. The black curve is the local linear estimator.

0.0 0.2 0.4 0.6 0.8 1.0 −2 −1 1 2

local linear estimator

x Y

  • 7 / 21
slide-8
SLIDE 8

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

Rcode

The following code produces the plots and the estimates of variance data=data.frame(x=x,Y=Y); fit=locfit(Y~lp(x,deg=1,h=0.2),data=data); plot(fit,ylim=c(-2,2),main=’local linear estimator’); points(data$x,data$Y,pch=20,col=’red’); #hat_sigma fit$dp[8]; #hat_sigma_r

  • rd =order(x);

Y.ord = Y[ord]; h_sg2= sum((diff(Y.ord))^2)/(2*(length(Y)-1)); ˆ σ = 0.292 ˆ σr = 0.277 σ2 = 0.25.

8 / 21

slide-9
SLIDE 9

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

Pointwise Confidence Intervals

We aim to construct a confidence interval for r(x0) based on ˆ r(x0). A direct idea is to find c such that P

r(x0) − r(x0)|

  • Var(ˆ

r(x0)) < c

  • ≥ 1 − α.

If Tn = ˆ

r(x0)−r(x0)

Var (ˆ r(x0)) was (asymptotically) normal, we could choose c

as the corresponding normal quantile. However, this is not the case. Note that we can decompose Tn in the following way. Tn = ˆ r(x0) − E(ˆ r(x0))

  • Var(ˆ

r(x0)) + E(ˆ r(x0)) − r(x0)

  • Var(ˆ

r(x0)) =: T1n + T2n.

9 / 21

slide-10
SLIDE 10

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

Pointwise Confidence Intervals

Now ˆ r(x0) is the weighted sum of independent random variables. By Lindburg’s central limit theorem, T1n

d

→ N(0, 1). However, the second term T2n does not vanish. T 2

2n =

bias2 Variance. Recall that we choose an optimal bandwidth to balancing the bias2 and the variance. Hence T2n 0. This is also true for bandwidth

  • btained by common data-driven bandwidth selectors such as cross

validation.

10 / 21

slide-11
SLIDE 11

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

Pointwise Confidence Intervals

We ignore the bias term. We need to estimate Var(ˆ r(x0)). We have Var(ˆ r(x0)) = Var n

  • i=1

li(x0)Yi

  • =

n

  • i=1

l2

i (x0)Var(Yi) = σ2 n

  • i=1

l2

i (x0).

Write n

i=1 l2 i (x0) = l(x0) 2. We estimate the variance by

  • Var(ˆ

r(x0)) = l(x0) 2 ˆ σ2. This is where we need the estimator of σ2.

11 / 21

slide-12
SLIDE 12

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

Pointwise Confidence Intervals

Via usual normal-theory approach, we obtain (1 − α) confidence interval given by ˆ r(x0) ± zα/2ˆ σ l(x0) . This is a confidence interval for E(ˆ r(x0)), NOT for r(x0)! Generally, the bias term is ignored and we just accept the fact that the interval is technically an interval for E(ˆ r(x0)). Alternative approaches are for instance to estimate the bias and then derive a bias-corrected CI, cf. Eubank and Speckman (1993); or Bootstrap method, cf. Hardle and Marron (1991).

12 / 21

slide-13
SLIDE 13

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

A simulation example

The black points represent the data. The curve is the local linear esti-

  • mator. The shadow region corresponds to pointwise 95% confidence

intervals.

  • 0.0

0.2 0.4 0.6 0.8 1.0 −2 −1 1

pointwise confidence intervals

x Y

13 / 21

slide-14
SLIDE 14

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

A simulation example

The black points represent the data. The shadow region corresponds to pointwise 95% confidence intervals. The curve is the real regression function, r(x).

  • 0.0

0.2 0.4 0.6 0.8 1.0 −2 −1 1

pointwise confidence intervals

x Y

14 / 21

slide-15
SLIDE 15

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

Simultaneous Confidence Band

Confidence band is a confidence set for the function as a whole. An ideal confidence band for r satisfies P(ˆ ll(x) ≤ r(x) ≤ ˆ uu(x), a ≤ x ≤ b) > 1 − α, where [a, b] is the domain of r. We shall again estimate the lower and upper bounds based on ˆ

  • r. Again,

due to the presence of the bias, we will get a confidence band for E(ˆ r) instead of r.

15 / 21

slide-16
SLIDE 16

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

Simultaneous Confidence Band

Based on the following result,

  • ˆ

r(x) − E(ˆ r(x))

  • Var(ˆ

r(x)) , a ≤ x ≤ b

  • d

→ {W(x), a ≤ x ≤ b}, where W is a Gaussian process, the distribution of which is known, the (1 − α) confidence band for E(ˆ r) is given by, ˆ r(x) ± cˆ σ l(x) , a ≤ x ≤ b, where c is such that P

  • max

a≤x≤b W(x) ≤ c

  • = 1 − α.

For the computational formula of c, see page 92 in Wasserman (2005).

16 / 21

slide-17
SLIDE 17

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

A simulation example

The black points represent the data. The curve is the local linear

  • estimator. The shadow region corresponds to 95% confidence band.
  • 0.0

0.2 0.4 0.6 0.8 1.0 −2 −1 1

simultaneous confidence band

x Y

17 / 21

slide-18
SLIDE 18

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

A simulation example

The black points represent the data. The shadow region corresponds to 95% confidence band. The curve is the real regression function, r(x).

  • 0.0

0.2 0.4 0.6 0.8 1.0 −2 −1 1

simultaneous confidence band

x Y

18 / 21

slide-19
SLIDE 19

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

Pick-It Lottery

  • 200

400 600 800 1000 200 400 600 800 95% pointwise confidence intervals Number Payoff

  • 200

400 600 800 1000 200 400 600 800 95% simultaneous confidence band Number Payoff

The patterns are statistically significant and not due to chance.

19 / 21

slide-20
SLIDE 20

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

Group Presentation (May 18)

Group 21

Download the data set Motorcycle from http://www.stat.cmu.edu/ ~larry/all-of-nonpar/data.html. The covariate is time and the response is acceleration. Perform a nonparametric regression to fit the model. Apply Nadaraya-Watson estimator and local linear estimator. Use cross validation approach to choose the smoothing parameters. Construct 95% confidence intervals, based on which draw a pointwise confidence band. Construct a 95% simultaneous confidence band.

20 / 21

slide-21
SLIDE 21

Estimation of Variance Pointwise Confidence Intervals Simultaneous Confidence Band Assignment

Group Presentation (May 18)

Group 22

Simulate data from the following model. Y = r(x) + N(0, 0.22), where r(x) = x2 − 2x, x ∈ [0, 2]. Implement pointwise confidence interval (cf. Page 12) for x0 = 0.

Consider two approaches for estimating σ2. Study the coverage fractions of your confidence intervals, i.e. the frequencies of r(x0) lying in the CI, via simulation. Give your comments.

21 / 21