Lecture 9: Nonparametric Regression (1) Applied Statistics 2015 1 / - - PowerPoint PPT Presentation

lecture 9 nonparametric regression 1
SMART_READER_LITE
LIVE PREVIEW

Lecture 9: Nonparametric Regression (1) Applied Statistics 2015 1 / - - PowerPoint PPT Presentation

Introduction Estimations: local modelling Cross Validation Assignments Lecture 9: Nonparametric Regression (1) Applied Statistics 2015 1 / 22 Introduction Estimations: local modelling Cross Validation Assignments An example: Pick-It


slide-1
SLIDE 1

Introduction Estimations: local modelling Cross Validation Assignments

Lecture 9: Nonparametric Regression (1)

Applied Statistics 2015

1 / 22

slide-2
SLIDE 2

Introduction Estimations: local modelling Cross Validation Assignments

An example: Pick-It Lottery

The New Jersey Pick-It Lottery is a daily numbers game run by the state of New Jersey. Buying a ticket entitles a player to pick a number between 0 and 999. Half of the money bet each day goes into the prize pool. (The state takes the other half.) The state picks a winning number at random, and the prize pool is shared equally among all winning tickets. We analyze the first 254 drawings after the lottery started in 1975. Figure 1 shows a scatterplot of the winning numbers and their payoffs.

2 / 22

slide-3
SLIDE 3

Introduction Estimations: local modelling Cross Validation Assignments

An example: Pick-It Lottery

  • 200

400 600 800 1000 200 400 600 800 Number Payoff

Although all numbers are equally likely to win, numbers chosen by fewer people have bigger payoffs if they win because the prize is shared among fewer tickets. Question: can we find some pattern from the data? Are there numbers with larger payoffs?

3 / 22

slide-4
SLIDE 4

Introduction Estimations: local modelling Cross Validation Assignments

An example: Pick-It Lottery

The question can be answered by regression analysis. Linear regression: assumes the linear relation between payoff and win- ning number. The blue dashed line is the least squares regression line, which shows a general trend of higher payoffs for larger winning num- bers.

  • 200

400 600 800 1000 200 400 600 800 Number Payoff

4 / 22

slide-5
SLIDE 5

Introduction Estimations: local modelling Cross Validation Assignments

Nonparametric regression

Nonparametric regression do not assume any parametric structure. It is also known as “learning a function” in the field of machine learning. There are n pairs of observations (x1, Y1), . . . , (xn, Yn). The response variable Y is related to the covariate x by the equations Yi = r(xi) + ǫi, i = 1, . . . , n where r is the regression function, E(ǫi) = 0 and Var(ǫi) = σ2. Here, we want to estimate r under weak assumptions without assuming a parametric model of r. We are treating the covariate xi as fixed – fixed design. For random design, the data are (Xi, Yi), i = 1, . . . , n and r(x) is the conditional expectation of Y given that X = x: r(x) = E(Y |X = x).

5 / 22

slide-6
SLIDE 6

Introduction Estimations: local modelling Cross Validation Assignments

A general idea behind different estimations

Note that Yi is the sum of r(xi) and some error, the expected value

  • f which is zero. This motivates to estimate r(x) by the average of

those Yi where Xi is “close” to x. Different ways of averaging and different measures of closeness lead to different estimators.

6 / 22

slide-7
SLIDE 7

Introduction Estimations: local modelling Cross Validation Assignments

An Example

The data are n = 60 pairs of observations from a certain regression model. How to construc ˆ rn, an etimator of r?

  • 0.0

0.2 0.4 0.6 0.8 1.0 −2 −1 1 2 x Y

7 / 22

slide-8
SLIDE 8

Introduction Estimations: local modelling Cross Validation Assignments

Estimator: Regressogram

A regressogram is construced in a similar manner as that for histogram. Here we consider that xi ∈ [0, 1]. Devide the unit interval into m equally spaced bins denoted by B1, B2, . . . , Bm. Define the regressogram, ˆ gn(x) = 1 kj

  • i:xi∈Bj

Yi, for x ∈ Bj, where kj is the number of points in Bj. Here we use the convention 0

0 = 0.

8 / 22

slide-9
SLIDE 9

Introduction Estimations: local modelling Cross Validation Assignments

Estimator: Regressogram

A regressogram is construced in a similar manner as that for histogram. Here we consider that xi ∈ [0, 1]. Devide the unit interval into m equally spaced bins denoted by B1, B2, . . . , Bm. Define the regressogram, ˆ gn(x) = 1 kj

  • i:xi∈Bj

Yi, for x ∈ Bj, where kj is the number of points in Bj. Here we use the convention 0

0 = 0.

  • 0.0

0.2 0.4 0.6 0.8 1.0 −2 −1 1 2

Regressogram (m=10)

x Y

8 / 22

slide-10
SLIDE 10

Introduction Estimations: local modelling Cross Validation Assignments

Estimator: Local average

Fix h > 0, ˆ rn(x) = n

i=1 I(x − h < xi ≤ x + h)Yi

n

i=1 I(x − h < xi ≤ x + h) .

This is also called naive kernel estimator: ˆ rn(x) =

n

i=1 1 2 1[−1,1)( x−xi h )Yi

n

i=1 1 2 1[−1,1)( x−xi h ) 9 / 22

slide-11
SLIDE 11

Introduction Estimations: local modelling Cross Validation Assignments

Estimator: Local average

Fix h > 0, ˆ rn(x) = n

i=1 I(x − h < xi ≤ x + h)Yi

n

i=1 I(x − h < xi ≤ x + h) .

This is also called naive kernel estimator: ˆ rn(x) =

n

i=1 1 2 1[−1,1)( x−xi h )Yi

n

i=1 1 2 1[−1,1)( x−xi h )

  • 0.0

0.2 0.4 0.6 0.8 1.0 −2 −1 1 2

Local Average (h=0.2)

x Y

9 / 22

slide-12
SLIDE 12

Introduction Estimations: local modelling Cross Validation Assignments

Nadaraya-Watson Estimator

Replacing the box kernel by a general kernel in the local average estimator, we obtain the Nadaraya-Watson estimator of r: ˆ rn(x) = n

i=1 K

x−xi

h

  • Yi

n

i=1 K

x−xi

h

.

10 / 22

slide-13
SLIDE 13

Introduction Estimations: local modelling Cross Validation Assignments

Nadaraya-Watson Estimator

Replacing the box kernel by a general kernel in the local average estimator, we obtain the Nadaraya-Watson estimator of r: ˆ rn(x) = n

i=1 K

x−xi

h

  • Yi

n

i=1 K

x−xi

h

.

  • 0.0

0.2 0.4 0.6 0.8 1.0 −2 −1 1 2

Nadaraya−Watson (h=0.2, kernel=guassian)

x Y

10 / 22

slide-14
SLIDE 14

Introduction Estimations: local modelling Cross Validation Assignments

The black curve indicates r(x), the real regression function. The underlying mode is: Yi = sin(8xi) − xi + x3

i + ǫi, with

ǫi ∼ N(0, 0.5).

  • 0.0

0.2 0.4 0.6 0.8 1.0 −2 −1 1 2 x Y 11 / 22

slide-15
SLIDE 15

Introduction Estimations: local modelling Cross Validation Assignments

Some comments

The three estimators can be written in the same form: ˆ rn(x) =

n

  • i=1

li(x)Yi. Define the class of piecewise constant functions Fm = { ˜ m : ˜ m(t) = m

i=1 ciI(t ∈ Bi), ci ∈ R} . Then the

regressogram ˆ gn = argmin

˜ m∈Fm n

  • i=1

(Yi − ˜ m(xi))2. The Nadaraya-Watson estimator can be considered as locally fitting a constant to to the data: ˆ rn(x) = argmin

c∈R n

  • i=1

K x − xi h

  • (Yi − c)2.

12 / 22

slide-16
SLIDE 16

Introduction Estimations: local modelling Cross Validation Assignments

Risk

For fixed x, MSE(ˆ rn(x)) = E

rn(x) − r(x))2 = (E(ˆ rn(x)) − r(x))2+Var(ˆ rn(x)) . For global index, we consider MISE(ˆ rn) =E

rn(x) − r(x))2dx

  • =
  • (E(ˆ

rn(x)) − r(x))2 dx +

  • Var(ˆ

rn(x)) dx; and the average mean square error AMSE(ˆ rn) = 1 n

n

  • i=1

E

rn(xi) − r(xi))2 .

13 / 22

slide-17
SLIDE 17

Introduction Estimations: local modelling Cross Validation Assignments

Cross Validation: choosing bandwidths

Take the AMSE as the criteria. We would like to choose h to minimize AMSE(h) = 1 n

n

  • i=1

E

rnh(xi) − r(xi))2 . Since r is unknown, we need to estimate AMSE(h). As a first guess,

  • ne might think of the average residual sums of squares

1 n

n

  • i=1

(Yi − ˆ rnh(xi))2 . This turns out to be a bad choice. It usually leads to undersmoothing (overfitting). The reason is that it favors estimates which are too well-adapted for the data and are not reasonsable for new observations.

14 / 22

slide-18
SLIDE 18

Introduction Estimations: local modelling Cross Validation Assignments

Cross Validation: choosing bandwidths

We estimate the risk using the leave-one-out cross validation score defined as CV (h) = 1 n

n

  • i=1
  • Yi − ˆ

r(i)

nh(xi)

2 , where ˆ r(i)

nh(xi) is the estimator based on {(xj, Yj), 1 ≤ j ≤ n, j = i}, i.e.

  • mmitting the observation (xi, Yi).

15 / 22

slide-19
SLIDE 19

Introduction Estimations: local modelling Cross Validation Assignments

Cross Validation: choosing bandwidths

In order to compute the CV score, there is no need to fit the curve n

  • times. Let ˆ

rnh(x) = n

i=1 li(x)Yi. Then CV (h) can be written as

CV (h) = 1 n

n

  • i=1

Yi − ˆ rnh(xi) 1 − li(xi) 2 . Hence hcv = argmin

h

CV (h) = argmin

h

1 n

n

  • i=1

Yi − ˆ rnh(xi) 1 − li(xi) 2 .

16 / 22

slide-20
SLIDE 20

Introduction Estimations: local modelling Cross Validation Assignments

An example: Pick-It Lottery

h = 1, 10, 20, 50.

  • 200

400 600 800 1000 200 400 600 800 Number Payoff

  • 200

400 600 800 1000 200 400 600 800 Number Payoff

  • 200

400 600 800 1000 200 400 600 800 Payoff

  • 200

400 600 800 1000 200 400 600 800 Payoff

17 / 22

slide-21
SLIDE 21

Introduction Estimations: local modelling Cross Validation Assignments

An example: Pick-It Lottery

  • 200

400 600 800 1000 200 400 600 800 Number Payoff

The curve suggests that there were larger payoffs for numbers in the interval [0, 100]. People tended to pick numbers starting with 2 and

  • 3. This pattern disappeared after 1976. People noticed the pattern

and changed their choices.

18 / 22

slide-22
SLIDE 22

Introduction Estimations: local modelling Cross Validation Assignments

Lectures 10 – 12

19 / 22

slide-23
SLIDE 23

Introduction Estimations: local modelling Cross Validation Assignments

Group Presentation (April 20)

Group 16

The data are the infant-mortality rates (infant death per 1000 live births) and GDP per capita (in U.S. dollars) for 193 countries in 2003. Make a scatter plot of the data. Estimate the regression function with different approaches. Give your comments.

20 / 22

slide-24
SLIDE 24

Introduction Estimations: local modelling Cross Validation Assignments

Group Presentation (April 20)

Group 17

Downloand the dataset CMB from http://www.stat.cmu.edu/~larry/all-of-nonpar/data.html. Consider power as response variable and Multipole as covariate. Fit a model based on the first 400 observations:

Make a scatter plot of the data. Consider Nadaraya-Watson

  • estimator. Using CV (h) score to choose h. Present your estimate.

Repeat the procedure above, but for the whole data set.

21 / 22

slide-25
SLIDE 25

Introduction Estimations: local modelling Cross Validation Assignments

Group Presentation (April 20)

Group 18

Consider the following model. Yi = r(xi) + ǫi, where r(x) = x2 − 2x, x ∈ [0, 2] and ǫi iid from N(0, 0.52).

Simulate one sample {(xi, Yi), i = 1, . . . , 100}. You can choose xi = i/50. Fit a Nadaraya-Watson estimator to this data. Choose your kernel and h. Estimate MSE(ˆ rn(1)) by simulation. Hint: simulate many samples from the model and use the sample counterpart as the estimator. Repeat step 2 for a different h. Compare the results and give your comments.

22 / 22