Univariate Smoothing Overview Problem Definition & Interpolation - - PowerPoint PPT Presentation

univariate smoothing overview problem definition
SMART_READER_LITE
LIVE PREVIEW

Univariate Smoothing Overview Problem Definition & Interpolation - - PowerPoint PPT Presentation

Univariate Smoothing Overview Problem Definition & Interpolation Problem definition Smoothing Problem: Given a data set with a single input variable x , find the best function g ( x ) that minimizes the Interpolation prediction


slide-1
SLIDE 1

Interpolation versus Extrapolation

  • Interpolation is technically defined only for inputs that are within

the range of the data set mini xi ≤ x ≤ maxi xi

  • If an input is outside of this range, the model is said to be

extrapolating

  • A good model should do reasonable things for both cases
  • Extrapolation is much a harder problem
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

3

Univariate Smoothing Overview

  • Problem definition
  • Interpolation
  • Polynomial smoothing
  • Cubic splines
  • Basis splines
  • Smoothing splines
  • Bayes’ rule
  • Density estimation
  • Kernel smoothing
  • Local averaging
  • Weighted least squares
  • Local linear models
  • Prediction error estimates
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

1

Example 1: Linear Interpolation

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −2 −1.5 −1 −0.5 0.5 1 1.5 2 Input x Output y Chirp Linear Interpolation

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

4

Problem Definition & Interpolation

  • Smoothing Problem: Given a data set with a single input

variable x, find the best function ˆ g(x) that minimizes the prediction error on new inputs (probably not in the data set)

  • Interpolation Problem: Same as the smoothing problem except

the model is subject to the constraint ˆ g(xi) = yi for every input-output pair (xi, yi) in the data set – Linear Interpolation: Use a line between each pair of points – Nearest Neighbor Interpolation: Find the nearest input in the data set and use the corresponding output as an approximate fit – Polynomial Interpolation: Fit a polynomial of order n − 1 to input output data: ˆ g(x) = n

i=1 wixi−1

– Cubic Spline Interpolation: Fit a cubic polynomial with continuous second derivatives in between each pair of points (more on this later)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

2

slide-2
SLIDE 2

yh = At*w; figure; FigureSet (1,’LTX ’); h = plot(xt ,yh ,’b’,x,y,’r.’); set(h,’MarkerSize ’ ,8); set(h,’LineWidth ’,1.2); xlabel(’Input x’); ylabel(’Output y’); title(’Chirp Polynomial Interpolation ’); set(gca ,’Box ’,’Off ’); grid on; axis ([0 1 -2 2]); AxisSet (8); print -depsc InterpolationPolynomial; % ================================================ % Cubic Spline Interpolation % ================================================ figure; FigureSet (1,’LTX ’); yh = spline(x,y,xt); h = plot(xt ,yh ,’b’,x,y,’r.’); set(h,’MarkerSize ’ ,8); set(h,’LineWidth ’,1.2); xlabel(’Input x’); ylabel(’Output y’); title(’Chirp Cubic Spline Interpolation ’); set(gca ,’Box ’,’Off ’);

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

7

Example 1: MATLAB Code

%function [] = Interpolation (); close all; N = 15; rand(’state ’ ,2); x = rand(N ,1); y = sin (2* pi *2* x. ^2) + 0.2*randn(N ,1); xt = (0:0 .0001 :1) ’; % Test inputs % ================================================ % Linear Interpolation % ================================================ figure; FigureSet (1,’LTX ’); yh = interp1(x,y,xt ,’linear ’); h = plot(xt ,yh ,’b’,x,y,’r.’); set(h,’MarkerSize ’ ,8); set(h,’LineWidth ’,1.2); xlabel(’Input x’); ylabel(’Output y’); title(’Chirp Linear Interpolation ’); set(gca ,’Box ’,’Off ’); grid on; axis ([0 1 -2 2]); AxisSet (8); print -depsc InterpolationLinear;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

5

grid on; axis ([0 1 -2 2]); AxisSet (8); print -depsc InterpolationCubicSpline ; % ================================================ % Cubic Spline Interpolation % ================================================ figure; FigureSet (1,’LTX ’); yt = sin (2* pi *2* xt. ^2); h = plot(xt ,yt ,’b’,x,y,’r.’); set(h,’MarkerSize ’ ,8); set(h,’LineWidth ’,1.2); xlabel(’Input x’); ylabel(’Output y’); title(’Chirp Optimal Model ’); set(gca ,’Box ’,’Off ’); grid on; axis ([0 1 -2 2]); AxisSet (8); print -depsc InterpolationOptimalModel ;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

8

% ================================================ % Nearest Neighbor Interpolation % ================================================ figure; FigureSet (1,’LTX ’); yh = interp1(x,y,xt ,’nearest ’); h = plot(xt ,yh ,’b’,x,y,’r.’); set(h,’MarkerSize ’ ,8); set(h,’LineWidth ’,1.2); xlabel(’Input x’); ylabel(’Output y’); title(’Chirp Nearest Neighbor Interpolation ’); set(gca ,’Box ’,’Off ’); grid on; axis ([0 1 -2 2]); AxisSet (8); print -depsc InterpolationNearestNeighbor; % ================================================ % Polynomial Interpolation % ================================================ A = zeros(N,N); for cnt = 1: size(A,2), A(:,cnt) = x.^(cnt -1); end; w = pinv(A)*y; At = zeros(length(xt),N); for cnt = 1: size(A,2), At(:,cnt) = xt. ^(cnt -1); end;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

6

slide-3
SLIDE 3

Example 3: Polynomial Interpolation

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −2 −1.5 −1 −0.5 0.5 1 1.5 2 Input x Output y Chirp Polynomial Interpolation

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

11

Example 2: Nearest Neighbor Interpolation

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −2 −1.5 −1 −0.5 0.5 1 1.5 2 Input x Output y Chirp Nearest Neighbor Interpolation

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

9

Example 3: MATLAB Code Same data set and test inputs as linear interpolation example.

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

12

Example 2: MATLAB Code Same data set and test inputs as linear interpolation example.

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

10

slide-4
SLIDE 4

Interpolation Comments

  • There are an infinite number of functions that satisfy the

interpolation constraint: ˆ g(xi) = yi ∀i

  • Of course, we would like to choose the model that minimizes the

prediction error

  • Given only data, there is no way to do this exactly
  • Our data set only specifies what ˆ

g(x) should be at specific points

  • What should it be in between these points?
  • In practice, the method of interpolation is usually chosen by the

user

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

15

Example 4: Cubic Spline Interpolation

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −2 −1.5 −1 −0.5 0.5 1 1.5 2 Input x Output y Chirp Cubic Spline Interpolation

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

13

Smoothing

  • For the smoothing problem, even the constraint is relaxed

ˆ g(xi) ≈ yi ∀i

  • The data set can be merely suggesting what the model output

should be approximately at some specified points

  • We need another constraint or assumption about the relationship

between x and y to have enough constraints to uniquely specify the model

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

16

Example 4: MATLAB Code Same data set and test inputs as linear interpolation example.

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

14

slide-5
SLIDE 5

Smoothing yi = g(xi) + εi

  • When we add noise, we can drop the interpolation constraint:

ˆ g(xi) = yi ∀i

  • But we still want ˆ

g(·) to be consistent with (i.e. close to) the data: ˆ g(xi) ≈ yi

  • The methods we will discuss are biased in favor of models that are

smooth

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

19

Smoothing Assumptions and Statistical Model y = g(x) + ε

  • Generally we assume that the data was generated from the

statistical model above

  • εi is a random variable with the following assumed properties

– Zero mean: E[ε] = 0 – εi and εj are independently distributed for i = j – εi is identically distributed

  • The two additional assumptions are usually made for the

smoothing problem – g(x) is continuous – g(x) is smooth

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

17

Bias-Variance Tradeoff Recall that MSE(x) = E

  • (g(x) − ˆ

g(x))2 = (g(x) − E[ˆ g(x)])2 + E

g(x) − E[ˆ g(x)])2 = Bias2 + Variance

  • Fundamental smoother tradeoff:

– Smoothness of the estimate ˆ g(x) – Fit to the data

  • This can also be framed as a bias-variance tradeoff
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

20

Example 5: Interpolation Optimal Model

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −2 −1.5 −1 −0.5 0.5 1 1.5 2 Input x Output y Chirp Optimal Model

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

18

slide-6
SLIDE 6

Example 6: Smoothing Problem MATLAB Code

function [] = SmoothingProblem (); A = load(’MotorCycle.txt ’); x = A(: ,1); y = A(: ,2); figure; FigureSet (1,’LTX ’); h = plot(x,y,’r.’); set(h,’MarkerSize ’ ,6); xlabel(’Input x’); ylabel(’Output y’); title(’Motorcycle Data Set ’); set(gca ,’Box ’,’Off ’); grid on; ymin = min(y); ymax = max(y); yrng = ymax -ymin; ymin = ymin - 0.05*yrng; ymax = ymax + 0.05*yrng; axis ([min(x) max(x) ymin ymax ]); AxisSet (8); %print -depsc Test; print -depsc SmoothingProblem;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

23

Bias-Variance Tradeoff Continued MSE(x) = (g(x) − E[ˆ g(x)])2 + E

g(x) − E[ˆ g(x)])2

  • Smooth models

– Less sensitive to the data – Less variance – Potentially high bias since they don’t fit the data well

  • Flexible models

– Sensitive to the data – In the most extreme case, they interpolate the data – High variance since they are sensitive to the data – Low bias

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

21

return; % ================================================ % Linear % ================================================ figure; FigureSet (1,4.5 ,2.8); A = [ones(N,1) x]; w = pinv(A)*y; yh = [ones(size(xt)) xt]*w; h = plot(xt ,yh ,’b’,x,y,’r.’); set(h,’MarkerSize ’ ,8); set(h,’LineWidth ’,1.2); xlabel(’Input x’); ylabel(’Output y’); title(’Chirp Linear Least Squares ’); set(gca ,’Box ’,’Off ’); grid on; ymin = min(y); ymax = max(y); yrng = ymax -ymin; ymin = ymin - 0.05*yrng; ymax = ymax + 0.05*yrng; axis ([min(x) max(x) ymin ymax ]); AxisSet (8); print -depsc LinearLeastSquares;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

24

Example 6: Univariate Smoothing Data

5 10 15 20 25 30 35 40 45 50 55 −100 −50 50 Input x Output y Motorcycle Data Set

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

22

slide-7
SLIDE 7

Example 7: MATLAB Code Matlab/PolynomialSmoothing.m

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

27

Polynomial Smoothing

  • We can fit a polynomial ˆ

g(x) = p−1

i=0 wixi to the data using the

linear modeling methods

  • Note that linear models are linear in the parameters wi
  • They need not be linear in the inputs
  • Alternatively, you can think of this as a linear model with p

different inputs where the ith input is given by xi = xi

  • This model is smooth in the sense that all derivatives of ˆ

g(x) are continuous

  • This is one measure of model smoothness
  • In general, this is a terrible smoother

– Terrible at extrapolation – The matrix inverse is often poorly conditioned and regularization is necessary – The user has to pick the order of the polynomial p − 1

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

25

Cubic Splines

  • Cubic splines are modeled after the properties of flexible rods ship

designers used to use to draw smooth curves

  • The rod would be rigidly constrained to go through specific points

(interpolation)

  • The rod smoothly bent from one point to the next
  • The rod naturally minimized its bending energy (i.e. curvature)
  • This can be approximated by a piecewise cubic polynomial
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

28

Example 7: Polynomial Smoothing

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Linear Regression Linear Quadratic Cubic 4th Order 5th Order

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

26

slide-8
SLIDE 8

Cubic Spline Constraints

Cubic Spline x y

  • Cubic splines are piecewise cubic
  • This means ˆ

g(x) = 3

i=0 wi(x)xi has different weights between

each pair of points

  • For the entire region between each pair of points, the weights are

fixed

  • Since each polynomial is defined by 4 parameters wi(x)
  • We have n + 1 regions where n is the number of points in the

data set

  • Thus, we need at least 4 × (n + 1) constraints for each region to

uniquely specifies the weights

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

31

Cubic Splines Functional Form ˆ g(x) =

3

  • i=0

wi(x)xi

  • Unlike polynomial regression, here the parameters wi(x) are also a

function of x

  • Consider a class of functions ˆ

g(x) that have the following properties – Continuous – Continuous 1st derivative – Continuous 2nd derivative – Interpolates the data: ˆ g(xi) = yi

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

29

Cubic Spline Constraints Continued

Cubic Spline x y

Let pk(x) be the polynomial between the point xk and xk+1. We need 4 × (n + 1) constraints to have the problem well defined.

Property Expression Constraints Interpolation ˆ g(xi) = yi n Continuous pk(xk+1) = pk+1(xk+1) n Continuous Derivative p′

k(xk+1) = p′ k+1(xk+1)

n Continuous 2nd Derivative p′′

k(xk+1) = p′′ k+1(xk+1)

n

Natural splines have 4 additional constraints p0(x1)′′′ = 0 p0(x1)′′ = 0 pn(xn)′′′ = 0 pn(xn)′′ = 0

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

32

Cubic Splines Smoothness Definition

  • Out of all the functions that meet the above criteria, consider

those that also minimize the approximate “curvature” of ˆ g(x) C ≡ xmax

xmin

d2ˆ g(x) dx 2 dx

  • These are piecewise cubics and are called cubic splines
  • In the sense of satisfying the criteria listed above and minimizing

the curvature C, cubic splines are optimal

  • Even with all of these constraints, ˆ

g(x) is not uniquely specified

  • There are several cubic splines that meet the strict criteria and

have the same curvature

  • The most popular additional constraints are

ˆ g(xmin)′′ = 0 ˆ g(xmax)′′ = 0

  • These are called natural cubic splines
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

30

slide-9
SLIDE 9

Example 8: Basis Function 0

1 2 3 4 5 6 7 8 9 10 0.2 0.4 0.6 0.8 1 Input x Output y Basis Function B0(x)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

35

Basis Splines

  • You could solve for the 4(n + 1) model coefficients by solving a

set of 4(n + 1) linear equations

  • This is cumbersome and very inefficient mathematically
  • An easier way is to use basis functions
  • Mathematically, each basis function is defined recursively

bi,j(x) = x − kj ki+j − kj bi−1,j(x) + ki+j+1 − x ki+j+1 − kj+1 bi−1,j+1(x)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

33

Example 8: Basis Function 1

1 2 3 4 5 6 7 8 9 10 0.2 0.4 0.6 0.8 1 Input x Output y Basis Function B1(x)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

36

Basis Splines Continued

  • The output of our model can then be written as

ˆ g(x) =

n−1

  • i=−2

wib3,i(x)

  • Numerically, this can be solved much more quickly (the order is

proportional to n)

  • Since the basis functions have finite support (i.e. finite span) the

equivalent A matrix is banded

  • Basis splines also have the nice property that they sum to unity

n−1

  • j=1−i

bi,j(x) = 1 ∀x ∈ [k1, kn]

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

34

slide-10
SLIDE 10

Example 8: MATLAB Code Matlab/BasisFunctions.m

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

39

Example 8: Basis Function 2

1 2 3 4 5 6 7 8 9 10 0.2 0.4 0.6 0.8 1 Input x Output y Basis Function B2(x)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

37

Smoothing Splines

  • For smoothing, we do not require ˆ

g(xi) = yi

  • But we would like it to be close: ˆ

g(xi) ≈ yi

  • How do we tradeoff smoothness (low variance) for a good fit to

the data (low bias)?

  • One way is to find the ˆ

g(xi) that minimizes the following performance criterion: Eλ =

n

  • i=1

(yi − ˆ g(xi))2 + λ +∞

−∞

(ˆ g(x)′′)2 dx

  • Contrast to cubic splines in which we required the first term to be

zero

  • The second term is a roughness penalty
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

40

Example 8: Basis Function 3

1 2 3 4 5 6 7 8 9 10 0.2 0.4 0.6 0.8 1 Input x Output y Basis Function B3(x)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

38

slide-11
SLIDE 11

Example 9: Smoothing Spline

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Smoothing Spline Regression α = 1.0 α = 0.5 α = 0.0

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

43

Smoothing Splines Continued Eλ =

n

  • i=1

(yi − ˆ g(xi))2 + λ +∞

−∞

(ˆ g(x)′′)2 dx

  • λ is a user-specified parameter that controls the tradeoff
  • It turns out the optimal solution (in the sense of minimizing Eλ)

is a smoothing spline

  • A smoothing spline is identical to a cubic spline in form

– There is an ith order polynomial between each pair of points – Same number of knots – Same number of different sets of polynomials

  • Unlike the cubic spline, we now drop the constraint that ˆ

g(xi) = yi

  • Instead, ˆ

g(xi) = ˜ yi for some set of ˜ yi

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

41

Example 10: Smoothing Spline

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Smoothing Spline Regression α=0.0001

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

44

Smoothing Splines Comments Eλ =

n

  • i=1

(yi − ˆ g(xi))2 + λ +∞

−∞

(ˆ g(x)′′)2 dx

  • Smoothing splines are smooth in the same sense as cubic splines

– If cubic, the second derivative is continuous – If quadratic, the first derivative is continuous – If linear, the function is continuous

  • If cubic smoothing spline, then

– As λ → ∞, ˆ g(x) approaches a linear least squares fit to the data (i.e. ˆ g(x)′′ → 0) – As λ → 0, ˆ g(x) becomes an interpolating cubic spline

  • This is implemented in MATLAB as csaps
  • Instead of λ, it takes an equivalent parameter scaled between 0

(linear least squares fit) and 1 (cubic spline interpolation)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

42

slide-12
SLIDE 12

Example 10: Smoothing Spline

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Smoothing Spline Regression α=0.2000

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

47

Example 10: Smoothing Spline

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Smoothing Spline Regression α=0.0010

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

45

Example 10: Smoothing Spline

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Smoothing Spline Regression α=0.5000

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

48

Example 10: Smoothing Spline

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Smoothing Spline Regression α=0.0100

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

46

slide-13
SLIDE 13

Example 10: MATLAB Code

function [] = SmoothingSplineEx (); close all; A = load(’MotorCycle.txt ’); xr = A(: ,1); % Raw values yr = A(: ,2); % Raw values x = unique(xr); y = zeros(size(x)); for cnt = 1: length(x), y(cnt) = mean(yr(xr==x(cnt ))); end; N = size(A ,1); % No. data set points xt = ( -10:0 .2 :70) ’; NT = length(xt); % No. test points NS = 3; % No. of different splines yh = zeros(NT ,NS); yh(: ,3) = csaps(x’,y’,0,xt ’)’; yh(: ,2) = csaps(x’,y’,0.5 ,xt ’)’; yh(: ,1) = csaps(x’,y’,1.0 ,xt ’)’; FigureSet (1,’LTX ’); h = plot(xt ,yh ,x,y,’k.’); set(h,’MarkerSize ’ ,8); set(h,’LineWidth ’,1.2);

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

51

Example 10: Smoothing Spline

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Smoothing Spline Regression α=0.9000

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

49

xlabel(’Input x’); ylabel(’Output y’); title(’Motorcycle Data Smoothing Spline Regression ’); set(gca ,’Box ’,’Off ’); grid on; axis ([ -10 70

  • 150 90]);

AxisSet (8); legend(’\alpha = 1.0’,’\alpha = 0.5’,’\alpha = 0.0’ ,4); print -depsc SmoothingSplineEx ; L = [0 .0001 0.001 0.01 0.2 0.5 0.9 0.99 ]; for cnt = 1: length(L), alpha = L(cnt ); figure; FigureSet (1,’LTX ’); yh = csaps(x’,y’,alpha ,xt ’)’; h = plot(xt ,yh ,x,y,’k.’); set(h,’MarkerSize ’ ,8); set(h,’LineWidth ’,1.2); xlabel(’Input x’); ylabel(’Output y’); st = sprintf(’Motorcycle Data Smoothing Spline Regression \\ alpha =%6 .4f ’,alpha ); title(st); set(gca ,’Box ’,’Off ’); grid on; axis ([ -10 70

  • 150 90]);

AxisSet (8); st = sprintf(’print -depsc SmoothingSplineEx %04d;’,round(alpha *10000)); eval(st); end;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

52

Example 10: Smoothing Spline

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Smoothing Spline Regression α=0.9900

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

50

slide-14
SLIDE 14

Density Estimation

  • Then a kernel density estimator is simply expressed as

ˆ f(x) = 1 n

n

  • i=1

bσ(x − xi)

  • The width of the kernel is specified by σ. Typically

bσ(u) = 1 σ b u σ

  • where it is easy to show that
  • bσ(u) du = 1 for any value of σ
  • Bumps shaped like a Gaussian are popular

b(u) =

1 √ 2πe− u2

2

  • Typically the bumps have even symmetry:

b(u) = b(−u) = b(|u|)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

55

Review of Bayes’ Rule

  • Bayes’ rule says that two discrete-valued random variables A and

B have the following relationship Pr {B|A} = Pr {A, B} Pr {A} = Pr {A|B} Pr {B} Pr {A}

  • Recall that earlier we found the the ˆ

g(x) that minimizes the MSE is given by ˆ Y = g∗(x) = E[Y |X = x]

  • For smoothing, we can use the continuous analog of Bayes’ rule to

estimate E[Y |X = x] f(y|X = x) = f(x, y) f(x) = f(x|Y = y)f(y) f(x)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

53

Example 11: Density Estimation

−10 10 20 30 40 50 60 70 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Input x Density p(x) Motorcycle Data Density Estimation w= 0.1

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

56

Continuous Bayes’ Rule f(y|X = x) = f(x, y) f(x) = f(x|Y = y)f(y) f(x)

  • E[Y |X = x] is given by

E[Y |X = x] = +∞

−∞

yf(y|X = x) dy

  • In order to estimate these equations we need a means of

estimating the densities f(x) and f(x, y)

  • A popular method of estimating density is to add a series of

“bumps” together

  • The bumps are called kernels and should have the following

property +∞

−∞

bσ(u) du = 1

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

54

slide-15
SLIDE 15

Example 11: Density Estimation

−10 10 20 30 40 50 60 70 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Input x Density p(x) Motorcycle Data Density Estimation w= 1.0

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

59

Example 11: Density Estimation

−10 10 20 30 40 50 60 70 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Input x Density p(x) Motorcycle Data Density Estimation w= 0.2

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

57

Example 11: Density Estimation

−10 10 20 30 40 50 60 70 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Input x Density p(x) Motorcycle Data Density Estimation w= 5.0

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

60

Example 11: Density Estimation

−10 10 20 30 40 50 60 70 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Input x Density p(x) Motorcycle Data Density Estimation w= 0.5

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

58

slide-16
SLIDE 16

Density Estimation in Higher Dimensions

  • Density estimation can be extended to higher dimensions in a the
  • bvious way

ˆ f(x) = 1 n

n

  • i=1

p

  • j=1

bσ(xj − xi,j) where xj is the jth element of the input vector x and xi,j is the jth element of the ith input vector in the data set

  • Although you can use this for large values of p, it is not

recommended

  • The estimate becomes inaccurate very quickly as the number of

dimensions grows

  • For one or two dimensions this is a pretty good technique
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

63

Example 11: MATLAB Code

function [] = DensityEx (); close all; A = load(’MotorCycle.txt ’); x = A(: ,1); % Raw values y = A(: ,2); % Raw values W = [0.1 0.2 0.5 1.0 5.0]; xt = ( -10:0 .05 :70) ’; for c1 = 1: length(W), w = W(c1); bs = zeros(size(xt)); % Bump sum for c2 = 1: length(x), bs = bs + exp(-(xt -x(c2)).^2/(2* w. ^2))/ sqrt (2* pi*w^2); end; bs = bs/length(x); figure; FigureSet (1,’LTX ’); h = plot(x,zeros(size(x)),’k.’,xt ,bs); set(h,’LineWidth ’,1.5); xlabel(’Input x’); ylabel(’Density p(x)’); st = sprintf(’Motorcycle Data Density Estimation w=%5 .1f ’,w);

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

61

Example 12: 2D Density Estimation

Input x Output y Motorcycle Data Input−Output Density Estimation w= 0.05 10 20 30 40 50 −100 −50 50 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

64

title(st); set(gca ,’Box ’,’Off ’); grid on; axis ([ -10 70 0 0.1]); AxisSet (8); st = sprintf(’print -depsc DensityEx %02d;’,round(w*10)); eval(st); end;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

62

slide-17
SLIDE 17

Example 12: 2D Density Estimation

Input x Output y Motorcycle Data Input−Output Density Estimation w= 0.50 10 20 30 40 50 −100 −50 50 0.02 0.04 0.06 0.08 0.1 0.12 0.14

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

67

Example 12: 2D Density Estimation

Input x Output y Motorcycle Data Input−Output Density Estimation w= 0.10 10 20 30 40 50 −100 −50 50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

65

Example 12: 2D Density Estimation

Input x Output y Motorcycle Data Input−Output Density Estimation w= 1.00 10 20 30 40 50 −100 −50 50 0.01 0.02 0.03 0.04 0.05 0.06 0.07

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

68

Example 12: 2D Density Estimation

Input x Output y Motorcycle Data Input−Output Density Estimation w= 0.20 10 20 30 40 50 −100 −50 50 0.05 0.1 0.15 0.2 0.25 0.3 0.35

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

66

slide-18
SLIDE 18

Density Estimation and Scaling

  • In higher dimensions it is important to scale each input to have

the same variance

  • The following example shows the same data set without scaling
  • Notice the oval-shaped bumps
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

71

Example 12: MATLAB Code

function [] = DensityEx2D (); close all; A = load(’MotorCycle.txt ’); xr = A(: ,1); % Raw values yr = A(: ,2); % Raw values xm = mean(xr); ym = mean(yr); xs = std(xr); ys = std(yr); x = (xr -xm)/xs; y = (yr -ym)/ys; W = [0.05 0.1 0.2 0.5 1.0]; xst = -2.0:0 .02 :2.5; % X-test points yst = -2.5:0 .02 :2.5; % Y-test points [xmt ,ymt] = meshgrid(xst ,yst ); % Grids of scaled test points xt = xst*xs + xm; % Unscaled x-test values yt = yst*ys + ym; % Unscaled y-test values for c1 = 1: length(W), w = W(c1); bs = zeros(size(xmt )); % Bump sum

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

69

Example 13: 2D Density Estimation

Input x Output y Motorcycle Data No Scaling Density Estimation w= 0.50 10 20 30 40 50 60 −150 −100 −50 50 1 2 3 4 5 6 7 8 9 x 10

−3

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

72

for c2 = 1: length(x), bx = exp(-(xmt -x(c2)).^2/(2* w. ^2))/ sqrt (2* pi*w^2); by = exp(-(ymt -y(c2)).^2/(2* w. ^2))/ sqrt (2* pi*w^2); bs = bs + bx.*by; end; bs = bs/length(x); figure; FigureSet (1,’LTX ’); h = imagesc(xt ,yt ,bs); hold on; h = plot(xr ,yr ,’k.’,xr ,yr ,’w.’); set(h(1),’MarkerSize ’ ,4); set(h(2),’MarkerSize ’ ,2); hold

  • ff;

set(gca ,’YDir ’,’Normal ’); xlabel(’Input x’); ylabel(’Output y’); st = sprintf(’Motorcycle Data Input -Output Density Estimation w=%5 .2f ’,w); title(st); set(gca ,’Box ’,’Off ’); colorbar; AxisSet (8); st = sprintf(’print -depsc DensityEx2D %03d;’,round(w*100)); eval(st); end;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

70

slide-19
SLIDE 19

Example 13: 2D Density Estimation

Input x Output y Motorcycle Data No Scaling Density Estimation w= 10.00 10 20 30 40 50 60 −150 −100 −50 50 0.5 1 1.5 2 2.5 3 x 10

−4

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

75

Example 13: 2D Density Estimation

Input x Output y Motorcycle Data No Scaling Density Estimation w= 1.00 10 20 30 40 50 60 −150 −100 −50 50 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10

−3

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

73

Example 13: 2D Density Estimation

Input x Output y Motorcycle Data No Scaling Density Estimation w= 20.00 10 20 30 40 50 60 −150 −100 −50 50 2 4 6 8 10 12 14 x 10

−5

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

76

Example 13: 2D Density Estimation

Input x Output y Motorcycle Data No Scaling Density Estimation w= 5.00 10 20 30 40 50 60 −150 −100 −50 50 1 2 3 4 5 6 7 8 x 10

−4

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

74

slide-20
SLIDE 20

h = plot(x,y,’k.’,x,y,’w.’); set(h(1),’MarkerSize ’ ,4); set(h(2),’MarkerSize ’ ,2); hold

  • ff;

set(gca ,’YDir ’,’Normal ’); xlabel(’Input x’); ylabel(’Output y’); st = sprintf(’Motorcycle Data No Scaling Density Estimation w=%6 .2f ’,w); title(st); set(gca ,’Box ’,’Off ’); colorbar; AxisSet (8); st = sprintf(’print -depsc DensityEx2Db %03d;’,round(w*10)); eval(st); end;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

79

Example 13: 2D Density Estimation

Input x Output y Motorcycle Data No Scaling Density Estimation w= 50.00 10 20 30 40 50 60 −150 −100 −50 50 1 1.5 2 2.5 3 3.5 4 x 10

−5

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

77

Kernel Smoothing Derivation The following equations compose the Nadaraya-Watson estimator of E[y|x] E[y|x] = ∞

−∞

y f(y|x) dy = ∞

−∞

y f(x, y) f(x) dy = ∞

−∞ y f(x, y) dy

f(x) The two densities can be estimated as follows ˆ f(x, y) = 1 n

n

  • i=1

bσ(|x − xi|) · bσ(|y − yi|) ˆ f(x) = 1 n

n

  • i=1

bσ(|x − xi|)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

80

Example 13: MATLAB Code

function [] = DensityEx2D (); % This is the same as DensityEx2D , except no scaling is used. close all; A = load(’MotorCycle.txt ’); x = A(: ,1); % Raw values y = A(: ,2); % Raw values W = [0.5 1.0 2.0 5.0 10.0 20.0 50.0]; xt = 0:0 .5 :60; % X-test points yt =

  • 150:90;

% Y-test points [xmt ,ymt] = meshgrid(xt ,yt); % Grids of scaled test points for c1 = 1: length(W), w = W(c1); bs = zeros(size(xmt )); % Bump sum for c2 = 1: length(x), bx = exp(-(xmt -x(c2)).^2/(2* w. ^2))/ sqrt (2* pi*w^2); by = exp(-(ymt -y(c2)).^2/(2* w. ^2))/ sqrt (2* pi*w^2); bs = bs + bx.*by; end; bs = bs/length(x); figure; FigureSet (1,’LTX ’); h = imagesc(xt ,yt ,bs); hold on;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

78

slide-21
SLIDE 21

Kernel Smoothing Derivation Continued Thus, by combining the equations on the previous slides we obtain E[y|x] ≈ ˆ g(x) = n

i=1 yi bσ(|x − xi|)

n

i=1 bσ(|x − xi|)

  • Popular kernels include

Epanechnikov: b(u) = c (1 − u2) p(u) Biweight: b(u) = c (1 − u2)2 p(u) Triweight: b(u) = c (1 − u2)3 p(u) Triangular: b(u) = c (1 − |u|) p(u) Gaussian: b(u) = c e−u2 Sinc: b(u) = c sinc(u)

  • Here c is a constant chosen to meet the constraint

−∞ b(u) du = 1

  • p(u) is the unit pulse:

p(u) =

  • 1

|u| ≤ 1 Otherwise

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

83

Kernel Smoothing Derivation Continued (1) E[y|x] ≈ ∞

−∞ y ˆ

f(x, y) dy ˆ f(x) ˆ f(x) E[y|x] ≈ ∞

−∞

y

  • 1

n

n

  • i=1

bσ(|x − xi|) · bσ(|y − yi|)

  • dy

= 1 n

n

  • i=1

bσ(|x − xi|) ∞

−∞

y bσ(|y − yi|) dy = 1 n

n

  • i=1

bσ(|x − xi|) ∞

−∞

(y − yi + yi) bσ(|y − yi|) dy = 1 n

n

  • i=1

bσ(|x − xi|) ×

  • yi

−∞

bσ(|y − yi|) dy + ∞

−∞

(y − yi) bσ(|y − yi|) dy

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

81

Example 14: Kernels

−2 2 0.5 1 Epanechnikov −2 2 0.5 1 Biweight −2 2 0.5 1 Triweight −2 2 0.5 1 Triangular −2 2 0.5 1 Gaussian −2 2 0.5 1 Sinc

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

84

Kernel Smoothing Derivation Continued (2) ˆ f(x) E[y|x] = 1 n

n

  • i=1

bσ(|x − xi|) ×

  • yi

−∞

bσ(|y − yi|) dy + ∞

−∞

(y − yi) bσ(|y − yi|) dy

  • =

1 n

n

  • i=1

bσ(|x − xi|) ×

  • yi +

−∞

u bσ(|u|) dy

  • =

1 n

n

  • i=1

yi bσ(|x − xi|) E[y|x] =

1 n

n

i=1 yi bσ(|x − xi|) 1 n

n

i=1 bσ(|x − xi|)

= n

i=1 yi bσ(|x − xi|)

n

i=1 bσ(|x − xi|)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

82

slide-22
SLIDE 22

Kernel Smoothing Comments E[y|x] ≈ ˆ g(x) = n

i=1 yi bσ(|x − xi|)

n

i=1 bσ(|x − xi|)

=

n

  • i=1

yi

  • bσ(|x − xi|)

n

j=1 bσ(|x − xj|)

  • Kernel smoothing can be written as a weighted average

ˆ g(x) =

n

  • i=1

yi wi(x) wi(x) = bσ(|x − xi|) n

j=1 bσ(|x − xj|)

  • Note that by definition n

i=1 wi(x) = 1

  • If all the weights were equal, wi(x) = 1

n, then ˆ

g(x) = ¯ y

  • This occurs as σ → ∞
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

87

Example 14: MATLAB Code

function [] = Kernels (); ST = 0.01; x = (-2.2:ST:2.2)’; u = abs(x); I = (u ≤1); kep = (1-u.^2).*I; % Epanechnikov kbw = (1-u.^2).^2.*I; % Biweight ktw = (1-u.^2).^3.*I; % Triweight ktr = (1-u).*I; % Triangular kga = exp(-u. ^2); % Gaussian ksn = sinc(u); % Sinc kep = kep /(sum(kep )*ST); % Normalize kbw = kbw /(sum(kbw )*ST); % Normalize ktw = ktw /(sum(ktw )*ST); % Normalize ktr = ktr /(sum(ktr )*ST); % Normalize kga = kga /(sum(kga )*ST); % Normalize ksn = ksn /(sum(ksn )*ST); % Normalize K = [kep kbw ktw ktr kga ksn ]; L = {’Epanechnikov ’,’Biweight ’,’Triweight ’,’Triangular ’,’Gaussian ’,’Sinc ’}; FigureSet (1,4.5 ,2.8); for cnt = 1:6, subplot (2,3,cnt );

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

85

Kernel Smoothing Effect of Support E[y|x] ≈ ˆ g(x) = n

i=1 yi bσ(|x − xi|)

n

i=1 bσ(|x − xi|)

As the width decreases (σ ↓) one of two things happens

  • If b(u) has infinite support,

– All of the equivalent weights become nearly equal to zero – The weight from the nearest neighbor dominates – Thus ˆ g(x) does nearest neighbor interpolation as σ → 0

  • If b(u) has finite support,

– At some values of x all of the weights may be 0 – In this happens ˆ g(x) at these points is not defined (depends on implementation)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

88

h = plot ([-5 5] ,[0 0],’k:’,x,K(:,cnt )); set(h(2),’LineWidth ’,1.5); title(char(L(cnt ))); box off; axis ([ min(x) max(x) -0.3 1.2 ]); end; AxisSet (8); print -depsc Kernels;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

86

slide-23
SLIDE 23

Example 15: Kernel Smoothing

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Kernel Smoothing Epanechnikov Kernel w=1.0000

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

91

Kernel Smoothing Bias-Variance Tradeoff E[y|x] ≈ ˆ g(x) =

1 n

n

i=1 yi bσ(|x − xi|) 1 n

n

i=1 bσ(|x − xi|)

  • Thus, as with smoothing splines there is a single parameter that

controls the tradeoff of smoothness (high bias) for the ability of the model to fit the data (high variance)

  • Kernel smoothers have bounded outputs

min

i

yi ≤ min

x

ˆ g(x) ≤ ˆ g(x) ≤ max

x

ˆ g(x) ≤ max

i

yi

  • In this sense, they are more stable than smoothing splines
  • Recall smoothing splines diverge outside of the data range
  • However, kernel smoothers are more likely to round off sharp

edges and peaks and troughs

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

89

Example 15: Kernel Smoothing

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Kernel Smoothing Epanechnikov Kernel w=2.0000

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

92

Example 15: Kernel Smoothing

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Kernel Smoothing Epanechnikov Kernel w=0.1000

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

90

slide-24
SLIDE 24

Example 15: Kernel Smoothing

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Kernel Smoothing Gaussian Kernel w=0.1000

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

95

Example 15: Kernel Smoothing

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Kernel Smoothing Epanechnikov Kernel w=5.0000

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

93

Example 15: Kernel Smoothing

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Kernel Smoothing Gaussian Kernel w=1.0000

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

96

Example 15: Kernel Smoothing

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Kernel Smoothing Epanechnikov Kernel w=10.0000

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

94

slide-25
SLIDE 25

Example 15: Kernel Smoothing

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Kernel Smoothing Gaussian Kernel w=10.0000

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

99

Example 15: Kernel Smoothing

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Kernel Smoothing Gaussian Kernel w=2.0000

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

97

Example 15: MATLAB Code

function [] = KernelSmoothingEx (); close all; A = load(’MotorCycle.txt ’); x = A(: ,1); % Raw values y = A(: ,2); % Raw values xt = ( -10:0 .05 :70) ’; W = [0.1 1.0 2.0 3.0 5.0 10.0]; % Epanechnikov Kernel for cnt = 1: length(W), w = W(cnt ); figure; FigureSet (1,’LTX ’); yh = Kernel(x,y,xt ,w ,2); h = plot(xt ,yh ,’b’,x,y,’k.’); set(h,’MarkerSize ’ ,8); set(h,’LineWidth ’,1.2); xlabel(’Input x’); ylabel(’Output y’); st = sprintf(’Motorcycle Data Kernel Smoothing Epanechnikov Kernel w=%6 .4f ’,w); title(st); set(gca ,’Box ’,’Off ’); grid on; axis ([ -10 70

  • 150 90]);
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

100

Example 15: Kernel Smoothing

−10 10 20 30 40 50 60 70 −150 −100 −50 50 Input x Output y Motorcycle Data Kernel Smoothing Gaussian Kernel w=5.0000

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

98

slide-26
SLIDE 26

Local Averaging Concept

Time (ms) Head Acceleration (g) Motorcycle Data Set 5 10 15 20 25 30 35 40 45 50 55 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

103

AxisSet (8); st = sprintf(’print -depsc EKernelSmoothingEx %03d;’,round(w*10)); eval(st); end; % Gaussian Kernel for cnt = 1: length(W), w = W(cnt ); figure; FigureSet (1,’LTX ’); yh = Kernel(x,y,xt ,w ,1); h = plot(xt ,yh ,’b’,x,y,’k.’); set(h,’MarkerSize ’ ,8); set(h,’LineWidth ’,1.2); xlabel(’Input x’); ylabel(’Output y’); st = sprintf(’Motorcycle Data Kernel Smoothing Gaussian Kernel w=%6 .4f ’,w); title(st); set(gca ,’Box ’,’Off ’); grid on; axis ([ -10 70

  • 150 90]);

AxisSet (8); st = sprintf(’print -depsc GKernelSmoothingEx %03d;’,round(w*10)); eval(st); end;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

101

MATLAB Code

function [] = LocalAverageConcept (); D = load(’Motorcycle.txt ’); x = D(: ,1); y = D(: ,2); [x,is] = sort(x); y = y(is); q = 30; % Query point (test input) k = 10; % No. of neighbors d = (x-q).^2; [ds ,is] = sort(d); xs = x(is); ys = y(is); xn = xs (1:k); yn = ys (1:k); [xsmin , imin] = min(xs(1:k)); [xsmax , imax] = max(xs(1:k)); imin = is(imin ); imax = is(imax ); xll = (x(imin) + x(imin -1))/2; % lower limit xul = (x(imax) + x(imax +1))/2; % upper limit

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

104

Local Averaging ˆ g(x) =

n

  • i=1

wi(x) yi

  • We saw that kernel smoothers can be viewed as a weighted

average

  • Instead, we could take a local average of the k-nearest neighbors
  • f x

ˆ g(x) = 1 k

k

  • i=1

yc(i) where c(i) is the data set index of the ith nearest point

  • For this type of model, k controls the smoothness
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

102

slide-27
SLIDE 27

Local Averaging Discussion ˆ g(x) = 1 k

k

  • i=1

yc(i)

  • Local averaging has a number of disadvantages

– The data set must be stored in memory. (This is essentially true for kernel smoothers and smoothing splines also) – The output ˆ g(x) is discontinuous – Finding the k nearest neighbors can be computationally expensive

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

107

rg = max(y)-min(y); ymin = min(y)-0.1*rg; ymax = max(y)+0 .1*rg; xbox = [xll xul xul xll ]; ybox = [ymin ymin ymax ymax ]; yav = mean(yn )*[1 1]; xav = [xll xul ]; A = [xn ones(k ,1)]; b = yn; v = pinv(A)*b; xl1 = 0; yl1 = [xl1 1]*v; xl2 = 1.5; yl2 = [xl2 1]*v; xll = [xl1 xl2 ]; yll = [yl1 yl2 ]; figure; FigureSet (1,’LTX ’); h = patch(xbox ,ybox ,’g’); set(h,’FaceColor ’,.8 *[1 1 1]); set(h,’EdgeColor ’,.8 *[1 1 1]); hold on; h = plot(x,y,’k.’); set(h,’MarkerSize ’ ,8);

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

105

Example 16: Local Averaging

Time (ms) Head Acceleration (g) Motorcycle Data Set k=2 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

108

h = plot(xav ,yav ,’r-’); set(h,’LineWidth ’,1.5); %h = plot(xll ,yll ,’b:’); h = plot(q*[1 1], [ymin ymax],’b--’); set(h,’LineWidth ’,1.5); hold

  • ff;

rg = max(y)-min(y); axis ([min(x) max(x) ymin ymax ]); xlabel(’Time (ms)’); ylabel(’Head Acceleration (g)’); title(’Motorcycle Data Set ’); set(gca ,’Layer ’,’top ’); set(gca ,’Box ’,’off ’); AxisSet (8); print -depsc LocalAverageConcept;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

106

slide-28
SLIDE 28

Example 16: Local Averaging

Time (ms) Head Acceleration (g) Motorcycle Data Set k=20 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

111

Example 16: Local Averaging

Time (ms) Head Acceleration (g) Motorcycle Data Set k=5 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

109

Example 16: Local Averaging

Time (ms) Head Acceleration (g) Motorcycle Data Set k=50 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

112

Example 16: Local Averaging

Time (ms) Head Acceleration (g) Motorcycle Data Set k=10 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

110

slide-29
SLIDE 29

Weighted Local Averaging ˆ g(x) = k

i=1 bk(|x − xc(i)|) yc(i)

k

i=1 bk(|x − xc(i))

= k

i=1 bi yc(i)

k

i=1 bi

  • Local averaging can be tweaked to produce a continuous ˆ

g(x)

  • We simply take a weighted average where b(u) is a smoothly

decreasing function of the distance

  • We can use our familiar (non-negative) kernels to achieve this
  • My favorite is the biweight function

bi =

  • 1 −

d2

i

d2

k+1

2 where di = |x − xc(i)| is the distance between the input and the ith nearest neighbor

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

115

Example 16: MATLAB Code

function [] = LocalAverageFit (); close all; D = load(’Motorcycle.txt ’); x = D(: ,1); y = D(: ,2); [x,is] = sort(x); y = y(is); xt = ( -10:0 .1 :70) ’; yh = zeros(size(xt )); K = [2 5 10 20 50]; for c = 1: length(K), k = K(c); for cnt = 1: length(xt), d = (x-xt(cnt )).^2; [ds ,is] = sort(d); xs = x(is); ys = y(is); xn = xs (1:k); yn = ys (1:k);

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

113

Example 17: Local Averaging Weighting Functions

0.2 0.4 0.6 0.8 1 1.2 0.2 0.4 0.6 0.8 1 Distance (u) Weighting Function Weighted Averaging Weighting Functions Epanechnikov Biweight Triweight Triangular

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

116

yh(cnt) = mean(yn); end; figure; FigureSet (1,’LTX ’); h = plot(x,y,’k.’); set(h,’MarkerSize ’ ,8); hold on; h = stairs(xt ,yh ,’b’); set(h,’LineWidth ’,1.2); hold

  • ff;

axis ([ -10 70

  • 150 90]);

xlabel(’Time (ms)’); ylabel(’Head Acceleration (g)’); st = sprintf(’Motorcycle Data Set k=%d’,k); title(st); set(gca ,’Layer ’,’top ’); set(gca ,’Box ’,’off ’); AxisSet (8); st = sprintf(’print -depsc LocalAverageEx %02d;’,k); eval(st); end;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

114

slide-30
SLIDE 30

Example 17: Weighted Averaging

Time (ms) Head Acceleration (g) Motorcycle Data Set k=10 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

119

Example 17: Weighted Averaging

Time (ms) Head Acceleration (g) Motorcycle Data Set k=2 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

117

Example 17: Weighted Averaging

Time (ms) Head Acceleration (g) Motorcycle Data Set k=20 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

120

Example 17: Weighted Averaging

Time (ms) Head Acceleration (g) Motorcycle Data Set k=5 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

118

slide-31
SLIDE 31

w = (1-(dn/dmax )).^2; yt(cnt) = sum(w.*yn)/ sum(w); end;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

123

Example 17: Weighted Averaging

Time (ms) Head Acceleration (g) Motorcycle Data Set k=50 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

121

Weighted Local Averaging Comments ˆ g(x) = k

i=1 bk(|x − xc(i)|) yc(i)

k

i=1 bk(|x − xc(i)|)

  • Like kernel smoothers, weighted local averaging models are stable

(bounded) min

c(i)∈1,k yc(i) ≤ ˆ

g(x) ≤ max

c(i)∈1,k yc(i)

  • The key difference here is that the kernel width is determined by

the distance to the (k + 1)th nearest neighbor

  • This is advantageous

– In regions of dense data, the equivalent kernel width shrinks – In regions of sparse data, the equivalent kernel width expands

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

124

Example 17: MATLAB Code

function [yt] = WeightedAverage (x,y,xt ,k); xarg = x; yarg = y; x = unique(xarg ); y = zeros(size(x)); for cnt = 1: length(x), y(cnt) = mean(yarg(xarg ==x(cnt ))); end; yt = zeros(length(xt) ,1); [Np ,Ni] = size(x); for cnt = 1: length(xt), d = zeros(Np ,1); for cnt2 = 1:Ni , d = d + (x(:,cnt2)-xt(cnt ,cnt2 )).^2; end; [ds ,is] = sort(d); xs = x(is); ys = y(is); dn = ds (1:k); dmax = ds(k+1); xn = xs (1:k); yn = ys (1:k);

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

122

slide-32
SLIDE 32

Local Model Consistency

  • Under general assumptions, it can be shown that smoothing

splines, kernel smoothers, and local models are consistent

  • Consistency means that as n → ∞, if the following conditions are

satisfied –

  • |bσ(u)| du < ∞

– limu→∞ u bσ(u) = 0 – E[ε2

i ] < ∞

– As n → ∞, σ → 0 and nσ → ∞ then at every point that is continuous for g(x) and f(x) with f(x) > 0, ˆ g(x) → g(x) with probability 1.

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

127

Local Model Optimality It can be shown that for fixed weighting functions, w(x), both kernel smoothers and weighted local averaging models minimize the weighted average squared error ASE ≡ 1 n

n

  • i=1

(yi − ˆ g(xi))2 b(|x − xi|) dASE dˆ g(xi) ∝

n

  • i=1

(yi − ˆ g(x)) b(|x − xi|) = 0 =

n

  • i=1

yib(|x − xi|) −

n

  • i=1

ˆ g(x)b(|x − xi|) =

n

  • i=1

yib(|x − xi|) − ˆ g(x)

n

  • i=1

b(|x − xi|) ˆ g(x) = n

i=1 yi b(|x − xi|)

n

i=1 b(|x − xi|)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

125

Bias-Variance Tradeoff PE ≡ E

  • (y − ˆ

g(x))2 = (g(x) − E [ˆ g(x)])2

  • Bias2

+ E

g(x) − E [ˆ g(x)])2

  • Variance
  • For each, we discussed a bias-variance tradeoff

– Less smooth ⇒ More variance and less bias – More smooth ⇒ Less variance and more bias

  • Recall that the prediction error can be written as shown above
  • The expectation is taken over the distribution of data sets used to

construct ˆ g(x)

  • Conceptually, this can be plotted
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

128

Local Model Optimality Continued ASE ≡ 1 n

n

  • i=1

(yi − ˆ g(xi))2 b(|x − xi|) ˆ g∗(x) = n

i=1 yib(|x − xi|)

n

i=1 b(|x − xi|)

  • Thus, we have an alternative derivation of kernel smoothers and

weighted local averaging models

  • They are the models that minimize the weighted ASE
  • The only difference between kernel smoothers, local averaging

models, and weighted local averaging models are the weighting functions, b(·)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

126

slide-33
SLIDE 33

Model Selection Continued

  • The prediction error can then be estimated by

PE ≈ r p n

  • × ASE

where the r(·) is a function that adjusts the ASE to be a more accurate estimate of the PE

  • A number of different functions r(·) have been proposed

– Final Prediction Error: r(u) = 1+u

1−u

– Schwartz’ Criterion: r(u) = 1 + loge n

2 u 1−u

– Generalized CVE: r(u) =

1 (1−u)2

– Shabata’s Model Selector: r(u) = 1 + 2u

  • In each case u = p

n

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

131

Bias-Variance Tradeoff Continued

Smoothness Prediction Error Bias Variance

  • Our goal is to minimize the prediction error
  • How do we choose the best smoothing parameter?
  • All of the methods we discussed had a single parameter that

controlled smoothness – Smoothing splines had a smoothness penalty parameter λ – Kernel methods had the bump width σ – Local averaging models had the number of neighbors k

  • How do we pick the best smoothness?
  • Would like an accurate estimate of the prediction error
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

129

Resampling Techniques

  • It is also possible to use resampling techniques

– N-Fold Cross-Validation: Divide the data set into N different sets. Pick the first set as the test set and build the model using the remaining N − 1 sets of points. Calculate the ASE on the test set. Repeat for all of the sets and average all N estimates of the ASE. – Leave-one-out Cross-Validation: Same as above for N = n. – Bootstrap: Select n points from the data set with replacement and calculate the ASE.

  • I personally prefer to use Leave-one-out CVE
  • A study I conducted last spring with weighted averaging models

indicated that CVE and Generalized CVE were the best (for that type of model)

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

132

Model Selection

  • How do we estimate the prediction error with only one data set?
  • The ASE won’t work: monotonically decreases as the smoothness

decreases

  • All of our smoothers can be written as

ˆ g(x) = ˆ y = H(x)y for a given input vector x.

  • This is very similar to the hat matrix of linear models, except now

the H matrix is a function of x

  • The equivalent degrees of freedom can be estimated by

p ≈ trace

  • HH

T

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

130

slide-34
SLIDE 34

Weighted Least Squares ASEb = 1 n

n

  • i=1

b2

i (yi − ˆ

yi)2 = (y − Aw)

TB TB(y − Aw)

where B = ⎡ ⎢ ⎢ ⎢ ⎣ b1 . . . b2 . . . . . . . . . ... . . . . . . bn ⎤ ⎥ ⎥ ⎥ ⎦

  • When we discussed linear models we found that the minimized the

average squared error

  • We can generalize this easily to find the best linear model that

minimizes the weighted ASE

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

135

Example: Weighted Averaging CVE

Number of Neighbors (k) CVE Local Averaging Cross−Validation Error 5 10 15 20 25 30 400 500 600 700 800 900

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

133

Weighted Least Squares Continued ASEb = (y − Aw)

TB TB(y − Aw)

  • This can be framed as a typical (unweighted) least squares

problem if we add the following definitions Ab ≡ BA yb ≡ By

  • Then the ASEb can be written as

ASEb = (yb − Abw)

T(yb − Abw)

which has the known optimal least squares solution w = (A

T

b Ab)−1A

T

b yb

  • Now we can easily generalize kernel methods and local averaging

models to create localized linear models

  • We merely specify the weights so that points near the input have

the most influence on the model output

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

136

Example: Weighted Averaging CVE

Time (ms) Head Acceleration (g) Motorcycle Data Local Averaging k

  • pt=11

−10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

134

slide-35
SLIDE 35

Example 18: Local Linear Model

Time (ms) Head Acceleration (g) Motorcycle Data Set k=10 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

139

Example 18: Local Linear Model

Time (ms) Head Acceleration (g) Motorcycle Data Set k=2 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

137

Example 18: Local Linear Model

Time (ms) Head Acceleration (g) Motorcycle Data Set k=20 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

140

Example 18: Local Linear Model

Time (ms) Head Acceleration (g) Motorcycle Data Set k=5 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

138

slide-36
SLIDE 36

Example 18: Local Linear Model

Time (ms) Head Acceleration (g) Motorcycle Data Set k=93 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

143

Example 18: Local Linear Model

Time (ms) Head Acceleration (g) Motorcycle Data Set k=30 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

141

Example 18: MATLAB Code

function [] = LocalLinearEx (); close all; D = load(’Motorcycle.txt ’); x = D(: ,1); y = D(: ,2); [x,is] = sort(x); y = y(is); xt = ( -10:0 .05 :70) ’; k = [2 5 10 20 30 50 93]; for cnt = 1: length(k), yh = LocalLinear(x,y,xt ,k(cnt )); figure; FigureSet (1,’LTX ’); h = plot(x,y,’k.’,xt ,yh ,’b’); set(h(1),’MarkerSize ’ ,8); set(h(2),’LineWidth ’,1.2); axis ([ -10 70

  • 150 90]);

xlabel(’Time (ms)’); ylabel(’Head Acceleration (g)’); st = sprintf(’Motorcycle Data Set k=%d’,k(cnt )); title(st); set(gca ,’Layer ’,’top ’);

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

144

Example 18: Local Linear Model

Time (ms) Head Acceleration (g) Motorcycle Data Set k=50 −10 10 20 30 40 50 60 70 −150 −100 −50 50

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

142

slide-37
SLIDE 37

Univariate Smoothing Summary Continued

  • All of the smoothing methods had a single parameter that controls

the smoothness of the model

  • For each, we discussed a bias-variance tradeoff

– Less smooth ⇒ More variance and less bias – More smooth ⇒ Less variance and more bias

  • We discussed a several methods of estimating the “true”

prediction error of the model – Some of the methods were simple modifications of the ASE – Other methods were based on resampling (cross-validation & the bootstrap)

  • Most can be generalized to the multivariate case
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

147

set(gca ,’Box ’,’off ’); AxisSet (8); st = sprintf(’print -depsc LocalLinearEx %02d;’,k(cnt )); eval(st); end;

  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

145

Univariate Smoothing Summary

  • We discussed four methods of interpolation

– Linear Interpolation – Nearest Neighbor Interpolation – Polynomial Interpolation – Cubic Spline Interpolation

  • We discussed six methods of univariate smoothing

– Polynomial regression (generalization of linear models) – Smoothing splines – Kernel smoothing – Local averaging – Weighted local averaging – Local linear models (weighted)

  • Discussed one method of density estimation based on kernels
  • J. McNames

Portland State University ECE 4/557 Univariate Smoothing

  • Ver. 1.25

146