Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 - - PowerPoint PPT Presentation

lecture 10 nonparametric regression 2
SMART_READER_LITE
LIVE PREVIEW

Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 - - PowerPoint PPT Presentation

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 / 18 Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Consistency of


slide-1
SLIDE 1

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Lecture 10: Nonparametric Regression (2)

Applied Statistics 2015

1 / 18

slide-2
SLIDE 2

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Consistency of Nadaraya-Watson Estimator

Here we consider the random design. There are n pairs of IID obser- vations (X1, Y1), . . . , (Xn, Yn) and Yi = r(Xi) + ǫi, i = 1, . . . , n, where ǫi’s and Xi’s are independent, and E(ǫi) = 0 and Var(ǫi) = σ2. Recall that for chosen smoothing parameter hn and kernel K, the Nadaraya-Watson estimator of r is given by ˆ rn(x) = n

i=1 K

  • x−Xi

hn

  • Yi

n

i=1 K

  • x−Xi

hn

.

2 / 18

slide-3
SLIDE 3

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Theorem (Consistency of Nadaraya-Watson Estimator)

Let hn → 0, nh = nhn → ∞ as n → ∞. Let f denote the density of X1 and Let E

  • Y 2

1

  • < ∞. Then for any x0 for which r(x0) and f(x0)

are continuous and f(x0) > 0, the Nadaraya-Watson estimator ˆ rn(x0) is a consistent estimator of r(x0), that is, ˆ rn(x)

P

→ r(x0), as n → ∞

3 / 18

slide-4
SLIDE 4

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Proof of the theorem

To prove this theorem, we need to use the following result.

Lemma (Theorem IA in Parzen (1962))

Suppose that w(y) is bounded and integrable function satisfying limy→∞ |yw(y)| = 0. Let g be an integrable function. Then for hn such that hn → 0 as n → ∞, lim

n→∞

1 hn

  • w

u − x hn

  • g(u)du = g(x)
  • w(u)du,

for every continuity point x of g.

4 / 18

slide-5
SLIDE 5

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Proof of the theorem

In the proof, we drop the subscript of n in hn. Denote ˆ fn(x0) = 1 nh

n

  • i=1

K x0 − Xi h

  • ˆ

ψn(x0) = 1 nh

n

  • i=1

K x0 − Xi h

  • Yi.

Then ˆ rn(x0) =

ˆ ψ(x0) ˆ fn(x0). Note that ˆ

fn(x0) is the kernel estimator of f(x0). It suffices to prove that ˆ fn(x0)

P

→ f(x0) and ˆ ψn(x0)

P

→ r(x0)f(x0). We will prove the latter using the lemma. The proof of the former is similar and simpler.

5 / 18

slide-6
SLIDE 6

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Proof of the theorem: ˆ ψn(x0)

P

→ r(x0)f(x0)

First we have, E

  • ˆ

ψn(x0) IID = 1 hE

  • K

x0 − X1 h

  • Y1
  • = 1

hE

  • K

x0 − X1 h

  • (r(X1) + ǫ1)
  • E(ǫ)=0

= 1 h

  • K

x0 − x h

  • r(x)f(x)dx → r(x0)f(x0)

Note that the kernel K satisfies the conditions on w of the lemma. The last convergence follows from the lemma and the symmetry of K. Similarly we can show that nhVar

  • ˆ

ψn(x0)

  • r2(x0) + σ2

f(x0)

  • K2(u)du.

Hence, E

  • ˆ

ψn(x0) − r(x0)f(x0) 2 → 0, which implies ˆ ψn(x0)

P

→ r(x0)f(x0).

6 / 18

slide-7
SLIDE 7

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

MISE of the Nadaraya-Watson estimator

Theorem 5.44 in Wasserman (2005)

The mean integrated square error of the Nadaraya-Watson estimator is MISE(ˆ rn) =h4

n

4

  • x2K(x)dx

r′′(x) + 2r′(x)f ′(x) f(x) 2 dx + σ2 nhn

  • K2(x)dx
  • 1

f(x)dx + o

  • nh−1

n

+ h4

n

  • The first term is the squared bias. The term r′(x) f ′(x)

f(x) is called the

design bias as it depends on the design, that is, the distribution of Xi’s. It is known that the kernel estimator has high bias near the boundaries

  • f the data. This is known as boundary bias.

7 / 18

slide-8
SLIDE 8

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Boundary bias

The blue curve is the N-W estimate and the black one is the real r(x).

  • 0.0

0.2 0.4 0.6 0.8 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 Nadaraya−Watson (h=0.2, kernel=guassian) x Y

8 / 18

slide-9
SLIDE 9

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Boundary bias

The blue curve is the N-W estimate and the black one is the real r(x).

  • 0.0

0.2 0.4 0.6 0.8 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 Nadaraya−Watson (h=0.2, kernel=guassian) x Y

To alleviate the boundary bias, the so-called local linear regression can be used.

8 / 18

slide-10
SLIDE 10

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Local linear regression

Suppose that we want to estimate r(x) and Xi is an observation close to

  • x. By Taylor expansion,

r(Xi) ≈ r(x) + r′(x)(Xi − x) =: a + b(Xi − x). Thus the problem of estimating r(x) is equivalent to estimating a! Now, we replace r(Xi) with Yi as we only observe Yi but not r(Xi). We want to find an a such that (Yi − (a + b(Xi − x)))2 is small. Take into account all the observations and let ˆ a and ˆ b be given by (ˆ a,ˆ b) = argmin

a,b n

  • i=1

K x − xi h

  • (Yi − (a + b(Xi − x)))2 .

The local linear estimator is defined as: ˜ rn(x) := ˆ a. Compare it with the Nadaraya-Watson estimator ˆ rn(x) = argminc∈R n

i=1 K

x−xi

h

  • (Yi − c)2.

9 / 18

slide-11
SLIDE 11

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Local linear regression

Write L(a, b) = n

i=1 K

x−xi

h

  • (Yi − (a + b(Xi − x)))2. Solving the

following equation, with ki = K x−xi

h

  • and zi = Xi − x,

∂L(a, b) ∂a = a

n

  • i=1

ki + b

n

  • i=1

kizi −

n

  • i=1

kiYi = 0 ∂L(a, b) ∂b = a

n

  • i=1

kizi + b

n

  • i=1

kiz2

i − n

  • i=1

kiYizi = 0, yields ˆ a = n

i=1 wi(x)Yi/ n i=1 wi(x), and thus

˜ rn(x) =

n

  • i=1

wi(x)Yi/

n

  • i=1

wi(x) where wi(x) = ki n

j=1 kjz2 j − zi

n

j=1 kjzj

  • .

10 / 18

slide-12
SLIDE 12

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Local linear regression

A linear smoother is defined by the following weighted average: n

i=1 li(x)Yi.

Clearly the local linear estimator is a linear smoother, so are the re- gressogram and the kernel estimator. Like Nadaraya-Watson estimator, ˜ rn(x) depends on h. We also need to choose h when using the linear estimator. The cross validation can be done in the same manner as that for N-W estimator.

11 / 18

slide-13
SLIDE 13

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Local linear regression: cross validation

Write the local linear estimator ˜ rh = ˜

  • rnh. The CV score is defined as

CV (h) = 1 n

n

  • i=1
  • Yi − ˜

r(i)

nh(Xi)

2 , where ˜ r(i)

nh(xi) is the estimator without using the observation (Xi, Yi).

Again, to compute the CV score, there is no need to fit the curve n times. We have the following relation, with li(Xi) = wi(Xi)/ n

j=1 wj(Xi),

CV (h) = 1 n

n

  • i=1

Yi − ˜ rnh(Xi) 1 − li(Xi) 2 . Hence hcv = argminh

1 n

n

i=1

  • Yi−˜

rnh(Xi) 1−li(Xi)

2 .

12 / 18

slide-14
SLIDE 14

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Comparison Nadaraya-Watson estimator V.S. Local linear estimator

Theorem 5.65 in Wasserman (2005); see also Fan (1992)

Let hn → 0 and nhn → ∞, as n → ∞. Under some smoothing conditions on f(x) and r(x), both ˆ rn(x) and ˜ rn(x) have variance σ2 nhnf(x)

  • K2(u)du + o

1 nh

  • .

The bias of ˆ rn(x) is h2

n

1 2r′′(x) + r′(x)f ′(x) f(x) u2K(u)du

  • + o(h2

n)

whereas ˜ rn(x) has bias 1

2h2 nr′′(x)

  • u2K(u)du
  • + o(h2

n).

At the boundary points, the NW estimator typically bears high bias due to the large absolute value of f ′(x)

f(x) . In this sense, local linear estimation

eliminates boundary bias and is free from design bias.

13 / 18

slide-15
SLIDE 15

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

The purple curve is the local linear estimate and the black one is the real r(x).

  • 0.0

0.2 0.4 0.6 0.8 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 Local linear (h=0.2, kernel=guassian) x Y

14 / 18

slide-16
SLIDE 16

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Blue: N-W estimate, purple: local linear, black: real r(x)

  • 0.0

0.2 0.4 0.6 0.8 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 x Y

15 / 18

slide-17
SLIDE 17

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Group Presentation (May 11)

Group 19

Consider the following simulation model, Y = sin X + ǫ, with ǫ ∼ N(0, 0.12) independent of X ∼ N(0, σ2). Take σ = 1. Simulate n = 200 observations, that is, generate (X1, Y1) . . . (Xn, Yn) from the model. Suppose that you don’t know the regression function and estimate it with Nadaraya-Watson esti- mator and local linear estimator. For each method, consider cross validation for choosing a proper bandwidth. Repeat for σ = 0.4. Give your comments.

16 / 18

slide-18
SLIDE 18

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Group Presentation (May 11)

Group 20

Download the data set Motorcycle from http://www.stat.cmu.edu/ ~larry/all-of-nonpar/data.html. The covariate is time and the response is acceleration. Perform a non- parametric regression to fit the model. Use the following estimators: regressogram, kernel and local linear. Use cross validation to choose the smoothing parameter.

17 / 18

slide-19
SLIDE 19

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments

Group Presentation (May 11)

Group 24

Consider the following simulation model, Y = −X2 − 2X + ǫ, where X is exponentially distributed with mean λ = .25, independent

  • f ǫ ∼ N(0, 0.52).

Consider sample size n = 100. Estimate MSE(ˆ r(x)) and MSE(˜ r(x)) for x = 0, 0.2, 0.4, 0.6, 0.8 and 1, by simulating many samples from the model.

18 / 18