Moving Beyond Linearity The truth is never linear! 1 / 23 Moving - - PowerPoint PPT Presentation

moving beyond linearity
SMART_READER_LITE
LIVE PREVIEW

Moving Beyond Linearity The truth is never linear! 1 / 23 Moving - - PowerPoint PPT Presentation

Moving Beyond Linearity The truth is never linear! 1 / 23 Moving Beyond Linearity The truth is never linear! Or almost never! 1 / 23 Moving Beyond Linearity The truth is never linear! Or almost never! But often the linearity assumption is


slide-1
SLIDE 1

Moving Beyond Linearity

The truth is never linear!

1 / 23

slide-2
SLIDE 2

Moving Beyond Linearity

The truth is never linear! Or almost never!

1 / 23

slide-3
SLIDE 3

Moving Beyond Linearity

The truth is never linear! Or almost never! But often the linearity assumption is good enough.

1 / 23

slide-4
SLIDE 4

Moving Beyond Linearity

The truth is never linear! Or almost never! But often the linearity assumption is good enough. When its not . . .

  • polynomials,
  • step functions,
  • splines,
  • local regression, and
  • generalized additive models
  • ffer a lot of flexibility, without losing the ease and

interpretability of linear models.

1 / 23

slide-5
SLIDE 5

Polynomial Regression

yi = β0 + β1xi + β2x2

i + β3x3 i + . . . + βdxd i + ǫi

20 30 40 50 60 70 80 50 100 150 200 250 300 Age Wage

Degree−4 Polynomial

20 30 40 50 60 70 80 0.00 0.05 0.10 0.15 0.20 Age

| | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | || |

Pr(Wage>250 | Age) 2 / 23

slide-6
SLIDE 6

Details

  • Create new variables X1 = X, X2 = X2, etc and then treat

as multiple linear regression.

3 / 23

slide-7
SLIDE 7

Details

  • Create new variables X1 = X, X2 = X2, etc and then treat

as multiple linear regression.

  • Not really interested in the coefficients; more interested in

the fitted function values at any value x0: ˆ f(x0) = ˆ β0 + ˆ β1x0 + ˆ β2x2

0 + ˆ

β3x3

0 + ˆ

β4x4

0.

3 / 23

slide-8
SLIDE 8

Details

  • Create new variables X1 = X, X2 = X2, etc and then treat

as multiple linear regression.

  • Not really interested in the coefficients; more interested in

the fitted function values at any value x0: ˆ f(x0) = ˆ β0 + ˆ β1x0 + ˆ β2x2

0 + ˆ

β3x3

0 + ˆ

β4x4

0.

  • Since ˆ

f(x0) is a linear function of the ˆ βℓ, can get a simple expression for pointwise-variances Var[ ˆ f(x0)] at any value x0. In the figure we have computed the fit and pointwise standard errors on a grid of values for x0. We show ˆ f(x0) ± 2 · se[ ˆ f(x0)].

3 / 23

slide-9
SLIDE 9

Details

  • Create new variables X1 = X, X2 = X2, etc and then treat

as multiple linear regression.

  • Not really interested in the coefficients; more interested in

the fitted function values at any value x0: ˆ f(x0) = ˆ β0 + ˆ β1x0 + ˆ β2x2

0 + ˆ

β3x3

0 + ˆ

β4x4

0.

  • Since ˆ

f(x0) is a linear function of the ˆ βℓ, can get a simple expression for pointwise-variances Var[ ˆ f(x0)] at any value x0. In the figure we have computed the fit and pointwise standard errors on a grid of values for x0. We show ˆ f(x0) ± 2 · se[ ˆ f(x0)].

  • We either fix the degree d at some reasonably low value,

else use cross-validation to choose d.

3 / 23

slide-10
SLIDE 10

Details continued

  • Logistic regression follows naturally. For example, in figure

we model Pr(yi > 250|xi) = exp(β0 + β1xi + β2x2

i + . . . + βdxd i )

1 + exp(β0 + β1xi + β2x2

i + . . . + βdxd i ).

  • To get confidence intervals, compute upper and lower

bounds on on the logit scale, and then invert to get on probability scale.

4 / 23

slide-11
SLIDE 11

Details continued

  • Logistic regression follows naturally. For example, in figure

we model Pr(yi > 250|xi) = exp(β0 + β1xi + β2x2

i + . . . + βdxd i )

1 + exp(β0 + β1xi + β2x2

i + . . . + βdxd i ).

  • To get confidence intervals, compute upper and lower

bounds on on the logit scale, and then invert to get on probability scale.

  • Can do separately on several variables—just stack the

variables into one matrix, and separate out the pieces afterwards (see GAMs later).

4 / 23

slide-12
SLIDE 12

Details continued

  • Logistic regression follows naturally. For example, in figure

we model Pr(yi > 250|xi) = exp(β0 + β1xi + β2x2

i + . . . + βdxd i )

1 + exp(β0 + β1xi + β2x2

i + . . . + βdxd i ).

  • To get confidence intervals, compute upper and lower

bounds on on the logit scale, and then invert to get on probability scale.

  • Can do separately on several variables—just stack the

variables into one matrix, and separate out the pieces afterwards (see GAMs later).

  • Caveat: polynomials have notorious tail behavior — very

bad for extrapolation.

  • Can fit using y ∼ poly(x, degree = 3) in formula.

4 / 23

slide-13
SLIDE 13

Step Functions

Another way of creating transformations of a variable — cut the variable into distinct regions.

C1(X) = I(X < 35), C2(X) = I(35 ≤ X < 50), . . . , C3(X) = I(X ≥ 65)

20 30 40 50 60 70 80 50 100 150 200 250 300 Age Wage

Piecewise Constant

20 30 40 50 60 70 80 0.00 0.05 0.10 0.15 0.20 Age

| | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

Pr(Wage>250 | Age) 5 / 23

slide-14
SLIDE 14

Step functions continued

  • Easy to work with. Creates a series of dummy variables

representing each group.

6 / 23

slide-15
SLIDE 15

Step functions continued

  • Easy to work with. Creates a series of dummy variables

representing each group.

  • Useful way of creating interactions that are easy to
  • interpret. For example, interaction effect of Year and Age:

I(Year < 2005) · Age, I(Year ≥ 2005) · Age would allow for different linear functions in each age category.

6 / 23

slide-16
SLIDE 16

Step functions continued

  • Easy to work with. Creates a series of dummy variables

representing each group.

  • Useful way of creating interactions that are easy to
  • interpret. For example, interaction effect of Year and Age:

I(Year < 2005) · Age, I(Year ≥ 2005) · Age would allow for different linear functions in each age category.

  • In R: I(year < 2005) or cut(age, c(18, 25, 40, 65, 90)).

6 / 23

slide-17
SLIDE 17

Step functions continued

  • Easy to work with. Creates a series of dummy variables

representing each group.

  • Useful way of creating interactions that are easy to
  • interpret. For example, interaction effect of Year and Age:

I(Year < 2005) · Age, I(Year ≥ 2005) · Age would allow for different linear functions in each age category.

  • In R: I(year < 2005) or cut(age, c(18, 25, 40, 65, 90)).
  • Choice of cutpoints or knots can be problematic. For

creating nonlinearities, smoother alternatives such as splines are available.

6 / 23

slide-18
SLIDE 18

Piecewise Polynomials

  • Instead of a single polynomial in X over its whole domain,

we can rather use different polynomials in regions defined by knots. E.g. (see figure) yi =

  • β01 + β11xi + β21x2

i + β31x3 i + ǫi

if xi < c; β02 + β12xi + β22x2

i + β32x3 i + ǫi

if xi ≥ c.

  • Better to add constraints to the polynomials, e.g.

continuity.

  • Splines have the “maximum” amount of continuity.

7 / 23

slide-19
SLIDE 19

20 30 40 50 60 70 50 100 150 200 250 Age Wage

Piecewise Cubic

20 30 40 50 60 70 50 100 150 200 250 Age Wage

Continuous Piecewise Cubic

20 30 40 50 60 70 50 100 150 200 250 Age Wage

Cubic Spline

20 30 40 50 60 70 50 100 150 200 250 Age Wage

Linear Spline 8 / 23

slide-20
SLIDE 20

Linear Splines

A linear spline with knots at ξk, k = 1, . . . , K is a piecewise linear polynomial continuous at each knot. We can represent this model as yi = β0 + β1b1(xi) + β2b2(xi) + · · · + βK+1bK+1(xi) + ǫi, where the bk are basis functions.

9 / 23

slide-21
SLIDE 21

Linear Splines

A linear spline with knots at ξk, k = 1, . . . , K is a piecewise linear polynomial continuous at each knot. We can represent this model as yi = β0 + β1b1(xi) + β2b2(xi) + · · · + βK+1bK+1(xi) + ǫi, where the bk are basis functions. b1(xi) = xi bk+1(xi) = (xi − ξk)+, k = 1, . . . , K Here the ()+ means positive part; i.e. (xi − ξk)+ = xi − ξk if xi > ξk

  • therwise

9 / 23

slide-22
SLIDE 22

0.0 0.2 0.4 0.6 0.8 1.0 0.3 0.5 0.7 0.9 x f(x) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 x b(x)

10 / 23

slide-23
SLIDE 23

Cubic Splines

A cubic spline with knots at ξk, k = 1, . . . , K is a piecewise cubic polynomial with continuous derivatives up to order 2 at each knot. Again we can represent this model with truncated power basis functions yi = β0 + β1b1(xi) + β2b2(xi) + · · · + βK+3bK+3(xi) + ǫi, b1(xi) = xi b2(xi) = x2

i

b3(xi) = x3

i

bk+3(xi) = (xi − ξk)3

+,

k = 1, . . . , K where (xi − ξk)3

+ =

(xi − ξk)3 if xi > ξk

  • therwise

11 / 23

slide-24
SLIDE 24

0.0 0.2 0.4 0.6 0.8 1.0 1.0 1.2 1.4 1.6 x f(x) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 x b(x)

12 / 23

slide-25
SLIDE 25

Natural Cubic Splines

A natural cubic spline extrapolates linearly beyond the boundary knots. This adds 4 = 2 × 2 extra constraints, and allows us to put more internal knots for the same degrees of freedom as a regular cubic spline.

20 30 40 50 60 70 50 100 150 200 250 Wage Natural Cubic Spline Cubic Spline

13 / 23

slide-26
SLIDE 26

Fitting splines in R is easy: bs(x, ...) for any degree splines, and ns(x, ...) for natural cubic splines, in package splines.

20 30 40 50 60 70 80 50 100 150 200 250 300 Age Wage

Natural Cubic Spline

20 30 40 50 60 70 80 0.00 0.05 0.10 0.15 0.20 Age

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || |

Pr(Wage>250 | Age) 14 / 23

slide-27
SLIDE 27

Knot placement

  • One strategy is to decide K, the number of knots, and then

place them at appropriate quantiles of the observed X.

  • A cubic spline with K knots has K + 4 parameters or

degrees of freedom.

  • A natural spline with K knots has K degrees of freedom.

20 30 40 50 60 70 80 50 100 150 200 250 300 Age Wage

Natural Cubic Spline Polynomial

Comparison

  • f

a degree-14 polyno- mial and a natural cubic spline, each with 15df.

15 / 23

slide-28
SLIDE 28

Knot placement

  • One strategy is to decide K, the number of knots, and then

place them at appropriate quantiles of the observed X.

  • A cubic spline with K knots has K + 4 parameters or

degrees of freedom.

  • A natural spline with K knots has K degrees of freedom.

20 30 40 50 60 70 80 50 100 150 200 250 300 Age Wage

Natural Cubic Spline Polynomial

Comparison

  • f

a degree-14 polyno- mial and a natural cubic spline, each with 15df.

ns(age, df=14) poly(age, deg=14)

15 / 23

slide-29
SLIDE 29

Smoothing Splines

This section is a little bit mathematical Consider this criterion for fitting a smooth function g(x) to some data: minimize

g∈S n

  • i=1

(yi − g(xi))2 + λ

  • g′′(t)2dt

16 / 23

slide-30
SLIDE 30

Smoothing Splines

This section is a little bit mathematical Consider this criterion for fitting a smooth function g(x) to some data: minimize

g∈S n

  • i=1

(yi − g(xi))2 + λ

  • g′′(t)2dt
  • The first term is RSS, and tries to make g(x) match the

data at each xi.

16 / 23

slide-31
SLIDE 31

Smoothing Splines

This section is a little bit mathematical Consider this criterion for fitting a smooth function g(x) to some data: minimize

g∈S n

  • i=1

(yi − g(xi))2 + λ

  • g′′(t)2dt
  • The first term is RSS, and tries to make g(x) match the

data at each xi.

  • The second term is a roughness penalty and controls how

wiggly g(x) is. It is modulated by the tuning parameter λ ≥ 0.

16 / 23

slide-32
SLIDE 32

Smoothing Splines

This section is a little bit mathematical Consider this criterion for fitting a smooth function g(x) to some data: minimize

g∈S n

  • i=1

(yi − g(xi))2 + λ

  • g′′(t)2dt
  • The first term is RSS, and tries to make g(x) match the

data at each xi.

  • The second term is a roughness penalty and controls how

wiggly g(x) is. It is modulated by the tuning parameter λ ≥ 0.

  • The smaller λ, the more wiggly the function, eventually

interpolating yi when λ = 0.

16 / 23

slide-33
SLIDE 33

Smoothing Splines

This section is a little bit mathematical Consider this criterion for fitting a smooth function g(x) to some data: minimize

g∈S n

  • i=1

(yi − g(xi))2 + λ

  • g′′(t)2dt
  • The first term is RSS, and tries to make g(x) match the

data at each xi.

  • The second term is a roughness penalty and controls how

wiggly g(x) is. It is modulated by the tuning parameter λ ≥ 0.

  • The smaller λ, the more wiggly the function, eventually

interpolating yi when λ = 0.

  • As λ → ∞, the function g(x) becomes linear.

16 / 23

slide-34
SLIDE 34

Smoothing Splines continued

The solution is a natural cubic spline, with a knot at every unique value of xi. The roughness penalty still controls the roughness via λ.

17 / 23

slide-35
SLIDE 35

Smoothing Splines continued

The solution is a natural cubic spline, with a knot at every unique value of xi. The roughness penalty still controls the roughness via λ. Some details

  • Smoothing splines avoid the knot-selection issue, leaving a

single λ to be chosen.

17 / 23

slide-36
SLIDE 36

Smoothing Splines continued

The solution is a natural cubic spline, with a knot at every unique value of xi. The roughness penalty still controls the roughness via λ. Some details

  • Smoothing splines avoid the knot-selection issue, leaving a

single λ to be chosen.

  • The algorithmic details are too complex to describe here.

In R, the function smooth.spline() will fit a smoothing spline.

17 / 23

slide-37
SLIDE 37

Smoothing Splines continued

The solution is a natural cubic spline, with a knot at every unique value of xi. The roughness penalty still controls the roughness via λ. Some details

  • Smoothing splines avoid the knot-selection issue, leaving a

single λ to be chosen.

  • The algorithmic details are too complex to describe here.

In R, the function smooth.spline() will fit a smoothing spline.

  • The vector of n fitted values can be written as ˆ

gλ = Sλy, where Sλ is a n × n matrix (determined by the xi and λ).

  • The effective degrees of freedom are given by

d fλ =

n

  • i=1

{Sλ}ii.

17 / 23

slide-38
SLIDE 38

Smoothing Splines continued — choosing λ

  • We can specify d

f rather than λ! In R: smooth.spline(age, wage, df = 10)

18 / 23

slide-39
SLIDE 39

Smoothing Splines continued — choosing λ

  • We can specify d

f rather than λ! In R: smooth.spline(age, wage, df = 10)

  • The leave-one-out (LOO) cross-validated error is given by

RSScv(λ) =

n

  • i=1

(yi − ˆ g(−i)

λ

(xi))2 =

n

  • i=1

yi − ˆ gλ(xi) 1 − {Sλ}ii 2 . In R: smooth.spline(age, wage)

18 / 23

slide-40
SLIDE 40

20 30 40 50 60 70 80 50 100 200 300 Age Wage

Smoothing Spline

16 Degrees of Freedom 6.8 Degrees of Freedom (LOOCV)

19 / 23

slide-41
SLIDE 41

Local Regression

0.0 0.2 0.4 0.6 0.8 1.0 −1.0 −0.5 0.0 0.5 1.0 1.5 O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O 0.0 0.2 0.4 0.6 0.8 1.0 −1.0 −0.5 0.0 0.5 1.0 1.5 O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O

Local Regression

With a sliding weight function, we fit separate linear fits over the range of X by weighted least squares. See text for more details, and loess() function in R.

20 / 23

slide-42
SLIDE 42

Generalized Additive Models

Allows for flexible nonlinearities in several variables, but retains the additive structure of linear models. yi = β0 + f1(xi1) + f2(xi2) + · · · + fp(xip) + ǫi.

2003 2005 2007 2009 −30 −20 −10 10 20 30 20 30 40 50 60 70 80 −50 −40 −30 −20 −10 10 20 −30 −20 −10 10 20 30 40 <HS HS <Coll Coll >Coll

f1(year) f2(age) f3(education) year age education 21 / 23

slide-43
SLIDE 43

GAM details

  • Can fit a GAM simply using, e.g. natural splines:

lm(wage ∼ ns(year, df = 5) + ns(age, df = 5) + education)

22 / 23

slide-44
SLIDE 44

GAM details

  • Can fit a GAM simply using, e.g. natural splines:

lm(wage ∼ ns(year, df = 5) + ns(age, df = 5) + education)

  • Coefficients not that interesting; fitted functions are. The

previous plot was produced using plot.gam.

22 / 23

slide-45
SLIDE 45

GAM details

  • Can fit a GAM simply using, e.g. natural splines:

lm(wage ∼ ns(year, df = 5) + ns(age, df = 5) + education)

  • Coefficients not that interesting; fitted functions are. The

previous plot was produced using plot.gam.

  • Can mix terms — some linear, some nonlinear — and use

anova() to compare models.

22 / 23

slide-46
SLIDE 46

GAM details

  • Can fit a GAM simply using, e.g. natural splines:

lm(wage ∼ ns(year, df = 5) + ns(age, df = 5) + education)

  • Coefficients not that interesting; fitted functions are. The

previous plot was produced using plot.gam.

  • Can mix terms — some linear, some nonlinear — and use

anova() to compare models.

  • Can use smoothing splines or local regression as well:

gam(wage ∼ s(year, df = 5) + lo(age, span = .5) + education)

22 / 23

slide-47
SLIDE 47

GAM details

  • Can fit a GAM simply using, e.g. natural splines:

lm(wage ∼ ns(year, df = 5) + ns(age, df = 5) + education)

  • Coefficients not that interesting; fitted functions are. The

previous plot was produced using plot.gam.

  • Can mix terms — some linear, some nonlinear — and use

anova() to compare models.

  • Can use smoothing splines or local regression as well:

gam(wage ∼ s(year, df = 5) + lo(age, span = .5) + education)

  • GAMs are additive, although low-order interactions can be

included in a natural way using, e.g. bivariate smoothers or interactions of the form ns(age,df=5):ns(year,df=5).

22 / 23

slide-48
SLIDE 48

GAMs for classification

log

  • p(X)

1 − p(X)

  • = β0 + f1(X1) + f2(X2) + · · · + fp(Xp).

2003 2005 2007 2009 −4 −2 2 4 20 30 40 50 60 70 80 −8 −6 −4 −2 2 −4 −2 2 4 HS <Coll Coll >Coll

f1(year) f2(age) f3(education) year age education

gam(I(wage > 250) ∼ year + s(age, df = 5) + education, family = binomial)

23 / 23