Nonlinear Modeling Overview Problem Definition Problem definition - - PowerPoint PPT Presentation

nonlinear modeling overview problem definition problem
SMART_READER_LITE
LIVE PREVIEW

Nonlinear Modeling Overview Problem Definition Problem definition - - PowerPoint PPT Presentation

Nonlinear Modeling Overview Problem Definition Problem definition Observed Process Variables Output Curse of dimensionality Observed Model x 1 ,...,x c y Variables Output x 1 ,...,x p Local Models Observed Unobserved y


slide-1
SLIDE 1

Working in Higher Dimensions

  • It would seem that we can simply generalize our methods for

univariate smoothing to higher dimensions

  • Why would we ever use a linear model?

– The smoothing methods impose fewer assumptions – Why not just allow any problem to be nonlinear?

  • Similarly, why would we ever exclude a possible explanatory

(input) variable? – Isn’t more information always better? – If an input variable might help, why would we ever exclude it? – Increases dimensionality of our input space, but so what?

  • This discussion based on [1, Section 2.5]
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

3

Nonlinear Modeling Overview

  • Problem definition
  • Curse of dimensionality
  • Local Models
  • Kernel Methods
  • Weighted Euclidean distance
  • Radial basis functions
  • Thin plate splines
  • Clustering Algorithms
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

1

Problems with Higher Dimensions

  • Most smoothing methods are essentially are based on local

weighting of points

  • In higher dimensions it becomes difficult to identify

“neighborhoods”

  • Called the curse of dimensionality
  • Shows up in many contexts
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

4

Problem Definition

z1,...,zq x1,...,xc y

Process

Observed Variables Unobserved Variables Output xc+1 ,...,xp Observed Variables x1,...,xp y

Model

Observed Variables Output

  • Nonlinear Modeling Problem: Given a data set with a single

input vector x, find the best function ˆ g(x) that minimizes the prediction error on new input vectors (probably not in the data set)

  • This problem has many names

– Nonlinear modeling problem – Nonparametric regression – Multivariate smoothing

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

2

slide-2
SLIDE 2

Euclidean Distance in High Dimensions

  • If the inputs are drawn from an i.i.d. distribution, the Euclidean

distance can be viewed as a scaled estimate of the average distance along a coordinate,δ2

j = (xj − xi,j)2

ˆ δ2 = 1 pd2

i = 1

p

p

  • j=1

(xj − xi,j)2 ≈ E[(xj − xi,j)2] µˆ

δ2 µδ2

σ2

ˆ δ2 = var[δ2] = p−1σ2 δ2

µd2 = pµδ2 σ2

d2 = var[pδ2] = pσ2 δ2

  • Thus the coefficient of variation is

γ σd2 µd2 = 1 √p σδ2 µδ2

  • All neighbors become more equidistant as p increases!
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

7

What is “Local”?

  • Suppose we have p inputs uniformly distributed in a unit

hypercube

  • Let us use a smaller hypercube only a fraction k/n of the points

to build our local model

  • An unusual neighborhood, but suitable for the point
  • What is the edge length of our hypercube neighborhood?

– The volume of our neighborhood is r k/n = ep where e is the edge length – Thus ep(r) = (r)1/p – If we wish to use a neighborhood that captures 1% of the volume/points, and p = 10, then e10(0.01) = 0.63! – Similarly e10(0.10) = 0.80 – The entire range of each input is only 1.0

  • How can such neighborhoods be considered “local”?
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

5

Sampling Density in High Dimensions

  • Suppose we want a uniform sampling density in one dimension

consisting of n = 100 points along a grid

  • In two dimensions we would need n = 1002 points to have the

same sampling density (spacing between neighboring points along the axes)

  • In p dimensions we would need n = 100p points!
  • If p = 10 we would need n = 10010, which is impractical
  • Thus, in high dimensions all data sets sparsely sample the input

space

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

8

Extrapolation versus Interpolation in High Dimensions

  • Suppose our inputs are uniformly distributed within a unit

hyper-sphere centered at the origin

  • The median distance from the origin to the nearest point is given

by d(p, n) =

  • 1 − 0.51/n1/p
  • d(5000, 10) ≈ 0.52, which is more than half way to the boundary
  • Thus, most of the data points are closer to the boundary than any
  • ther data point
  • This means we are always trying to estimate near the edges
  • In higher dimensions, we are effectively attempting to extrapolate

rather than interpolate or smooth between the data points

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

6

slide-3
SLIDE 3

Nonlinear Modeling

  • In low dimensional spaces local models continue to work well

– Also applies in high-dimensional spaces that can be reduced to lower dimensions

  • The following slides focus on “local” models as applied to a two

dimensional problem

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

11

Coping with The Curse

  • If the complexity of the problem grows with dimensionality (e.g.,

g(x) = e−||x||2, a simple bump in p dimensions), we must have a dense sample (n ∝ np

1) to estimate g(x) with the same accuracy

  • ver all values of p
  • In practice we don’t have this luxury
  • This also suggests that adding inputs (increasing p) makes

matters much worse, a paradoxical result

  • The way out of the curse is to impose structure on g(x)

– For example a linear model may be reasonable y = wTx + ε – Now we only need n ≈ 10p, dramatically fewer points to build

  • ur model

– This model doesn’t suffer from the curse

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

9

The RampHill Data Set

  • Like the Motorcycle data set for the univariate smoothing problem,

I have a favorite data set for the nonlinear modeling problem

  • The RampHill data set [2, p. 150]
  • It is a synthetic data set
  • It has a number of nice properties for testing nonlinear models

– Two flat regions in which the process is constant – A local bump (function of two input variables) – A global ramp that is locally linear – Two sharp edges (most models are smoother than this)

  • It only has two inputs so that we can plot the output (surface)
  • The inputs were drawn from a uniform distribution
  • No noise
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

12

Imposing Structure

  • Another possible mechanism to cope with the curse is to use PCA
  • r similar techniques

– Appropriate when the inputs are correlated – In other words, when the inputs fall close to a lower dimensional hyperplane in the p dimensional space

  • May also be appropriate to consider weighted distances

– Idea is that a few inputs may dominate the distance measure

  • There are many other ideas for imposing structure
  • Is the key difference between different nonlinear modeling

strategies

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

10

slide-4
SLIDE 4

Example 1: Ramp Hill Surface Plot

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

15

Example 1: Ramp Hill

  • Number of points per (training/estimation) data set: 250
  • Number of evaluation points (test/evaluation) data set: 2500
  • Noise power: 0.07
  • Signal (g(x)) power: 0.70
  • Signal-to-noise ratio (SNR): 10.01
  • Number of data sets: 250
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

13

Example 1: MATLAB Code

function [] = ShowRampHill () % ============================================================================== % Author -Specified Parameters % ============================================================================== nPointsBuild = 250; nDataSetsBuild = 250; nPointsSide = 50; noisePower = 0.07; % ============================================================================== % Preprocessing % ============================================================================== x1Side = linspace (-1.5 ,1.5 , nPointsSide ); x2Side = linspace (-1.5 ,1.5 , nPointsSide ); [x1Block , x2Block] = meshgrid(x1Side ,x2Side ); x1Test = reshape(x1Block , nPointsSide ^2 ,1); x2Test = reshape(x2Block , nPointsSide ^2 ,1); xTest = [x1Test x2Test ]; functionName = mfilename; fileIdentifier = fopen ([ functionName ’.tex ’],’w’); % ============================================================================== % Create the Data Sets % ============================================================================== DataSetsBuild = repmat(struct(’x’ ,[],’y’ ,[]), nDataSetsBuild ,1);

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

16

Example 1: Ramp Hill

x1 x2 −1.5 −1 −0.5 0.5 1 1.5 −1.5 −1 −0.5 0.5 1 1.5

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

14

slide-5
SLIDE 5

% Show Side -by -Side Views of the Surface % ============================================================================== % yTestSquare = reshape(yTest ,nSide ,nSide ); % surf(x1Sweep ,x2Sweep , yTestSquare ); figure; FigureSet (1 ,4 ,4); colormap(pink (256)); h = surf(x1Side ,x2Side , yTestBlock ); set(h,’LineWidth ’,0.1); hold on; h = plot3(xBuild (:,1), xBuild (:,2), yBuild ,’g.’); set(h,’MarkerSize ’ ,12); hold

  • ff;

set(gca ,’Position ’ ,[0.13 0.13 0.85 0.85 ]); xlim ([-1.5 1.5 ]); ylim ([-1.5 1.5 ]); zlim ([-1.0 1.0 ]); caxis ([-3 1]); xlabel(’x1 ’,’Interpreter ’,’LaTeX ’); ylabel(’x2 ’,’Interpreter ’,’LaTeX ’); zlabel(’g(x1, x2)’,’Interpreter ’,’LaTeX ’); box off; axis square grid on; AxisSet; fprintf(fileIdentifier ,’%%==============================================================================\ n’ fprintf(fileIdentifier ,’\\ newslide\n’); fprintf(fileIdentifier ,’\\ slideheading{ Example \\ arabic{exc }: Ramp Hill Surface Plot }\n’); fprintf(fileIdentifier ,’%%==============================================================================\ n’ fprintf(fileIdentifier ,’\\ begin{center }\n’);

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

19

for c1=1: nDataSetsBuild DataSetsBuild (c1).x = 0.6*randn(nPointsBuild ,2); DataSetsBuild (c1).y = CreateDataSet ( DataSetsBuild (c1).x ,’function ’,’RampHill ’,’NoisePower ’,noisePower ); end yTest = CreateDataSet (xTest ,’function ’,’RampHill ’); nPointsTest = length(yTest ); save(’RampHill ’,’DataSetsBuild ’,’xTest ’,’yTest ’,’nPointsSide ’,’x1Side ’,’x2Side ’); yTestBlock = reshape(yTest ,nPointsSide , nPointsSide ); xBuild = DataSetsBuild (1) .x; yBuild = DataSetsBuild (1) .y; % ============================================================================== % Write Table Summarizing Properties % ============================================================================== fprintf(fileIdentifier ,’%%==============================================================================\ n’ fprintf(fileIdentifier ,’\\ newslide\n’); fprintf(fileIdentifier ,’\\ stepcounter{exc }\n’); fprintf(fileIdentifier ,’\\ slideheading{ Example \\ arabic{exc }: Ramp Hill }\n’); fprintf(fileIdentifier ,’%%==============================================================================\ n’ fprintf(fileIdentifier ,’\\ begin{bulleted }\n’); fprintf(fileIdentifier ,’\t\\ item Number of points per (training/ estimation) data set: %d \n’,nPointsBuild ); fprintf(fileIdentifier ,’\t\\ item Number of evaluation points (test/ evaluation) data set: %d \n’,nPointsTest fprintf(fileIdentifier ,’\t\\ item Noise power: %4 .2f \n’,noisePower ); fprintf(fileIdentifier ,’\t\\ item Signal (g(Bx)) power: %4 .2f \n’,var(yTest )); fprintf(fileIdentifier ,’\t\\ item Signal -to -noise ratio (SNR ): %4 .2f \n’,var(yTest )/ noisePower ); fprintf(fileIdentifier ,’\t\\ item Number of data sets: %d \n’,nDataSetsBuild ); fprintf(fileIdentifier ,’\\end{bulleted }\n’); fprintf(fileIdentifier ,’\n’);

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

17

for c1 =1:2 switch c1 case 1, view (46 ,12); case 2 view ( -71 ,12); end fileName = sprintf(’%s-%s%d’,functionName ,’SurfacePlot ’,c1); print(fileName ,’-depsc ’); fprintf(fileIdentifier ,’\t\\ includegraphics [ width =0 .48 \\ columnwidth ]{ Matlab /%s} ’,fileName ); if c1==1, fprintf(fileIdentifier ,’\\ hfill \n’); else fprintf(fileIdentifier ,’\n’); end; end fprintf(fileIdentifier ,’\\end{center }\n’); fprintf(fileIdentifier ,’\n’); % ============================================================================== % List the MATLAB Code % ============================================================================== fprintf(fileIdentifier ,’%%==============================================================================\ n’ fprintf(fileIdentifier ,’\\ newslide \n’); fprintf(fileIdentifier ,’\\ slideheading{ Example \\ arabic{exc }: MATLAB Code }\n’); fprintf(fileIdentifier ,’%%==============================================================================\ n’ fprintf(fileIdentifier ,’\t \\ matlabcode{Matlab /% s.m }\n’,functionName ); fprintf(fileIdentifier ,’\n’); % ============================================================================== % Close the File % ============================================================================== fclose( fileIdentifier );

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

20

% ============================================================================== % Show A 2D Scatter of a Typical Build Data Set % ============================================================================== figure; FigureSet (1,’Slides ’); contour(x1Side ,x2Side ,yTestBlock ,25); hold on; h = scatter(xBuild (:,1), xBuild (:,2),5, yBuild ,’filled ’); set(h,’LineWidth ’,0.2); set(h,’MarkerEdgeColor ’,’k’) hold

  • ff;

axis ([-1.5 1.5

  • 1.5 1.5 ]);

axis(’square ’) FigureLatex; xlabel(’x1 ’); ylabel(’x2 ’); zlabel(sprintf(’No. Build Points: %4.0f ’,nPointsBuild )); AxisSet (8); fileName = sprintf(’%s-%s’,functionName ,’Contour ’); print(fileName ,’-depsc ’); fprintf(fileIdentifier ,’%%==============================================================================\ n’ fprintf(fileIdentifier ,’\\ newslide\n’); fprintf(fileIdentifier ,’\\ slideheading{ Example \\ arabic{exc }: Ramp Hill }\n’); fprintf(fileIdentifier ,’%%==============================================================================\ n’ fprintf(fileIdentifier ,’\\ begin{center }\n’); fprintf(fileIdentifier ,’\t\\ includegraphics [ scale =1]{ Matlab /%s}\n’,fileName ); fprintf(fileIdentifier ,’\\end{center }\n’); fprintf(fileIdentifier ,’\n’); % ==============================================================================

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

18

slide-6
SLIDE 6

Assessment and Demonstrations

  • For the each of the nonlinear models I will show a sequence of

3 × 5 slides

  • Each of the 3 sequences of slides shows an example of

under-smoothing, estimated optimal smoothing, and

  • ver-smoothing
  • Each sequence consists of five slides showing the following
  • 1. Estimated prediction error versus the smoothing parameter
  • 2. Estimated surface from two different viewing angles from the

first data set

  • 3. Average estimated surface from across all of the data sets
  • 4. Bias: difference between the true and the average
  • 5. Standard deviation (

√ Variance)

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

23

Revisiting MSE MSE E

  • (y − ˆ

y)2

  • MSE = 1

n

n

  • i=1

(yi − ˆ yi)2

  • MSE is a convenient measure for the many reasons we’ve

discussed previously – Decomposes into sum of variance and bias2 – In the linear case, an optimal solution is available in closed form (no iterative optimization necessary)

  • However, the units are not intuitive

– Square of units of the output – If we scale the output by α, the MSE is scaled by α2

  • One alternative is to use RMSE, which has the same units as the
  • utput
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

21

Kernel Methods and Local Models ASE = 1 n

n

  • i=1

bi (yi − ˆ g(xi))2 ˆ g∗(x) = n

i=1 biyi

n

i=1 bi

  • We discussed a number of methods of univariate smoothing
  • Most of these methods can be easily generalized to the

multivariate case

  • For the univariate case, the weights bi were a function of the

absolute distance from the input x to the ith point in the data set: b (|x − xi|)

  • In the multivariate case we can simply replace this with the

Euclidean distance: b (x − xi)

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

24

Normalized Mean Square Error NMSE 1 σy2 E

  • (y − ˆ

y)2

  • NMSE = NASE =

n

i=1 (yi − ˆ

yi)2 n

i=1 (yi − ¯

y)2

  • Can also use the normalized mean squared error (NMSE)
  • Some useful properties

– Is invariant to scaling of the output – Can be used across problems – Can be understood as a measure of how much more variation the model explains than the sample average (a constant) would

  • Natural scale

– NMSE = 0 means a perfect fit – NMSE = 1 means the model is no better than the sample average – NMSE > 1 means the model is worse than the sample average

  • Similar to the coefficient of determination, but not bounded
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

22

slide-7
SLIDE 7

Example 2: Weighted Local Average n=250

5 10 15 20 25 30 35 40 45 50 0.2 0.4 0.6 0.8 1

  • No. Neighbors

Estimated Prediction NMSE Test Set Estimated

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

27

Distance Measures d2

i = x − xi2 = p

  • j=1

(xj − xi,j)2

  • Euclidean distance may be best
  • Not clear Euclidean distance is best
  • Sensitive to scaling of the inputs
  • If input units are dissimilar, at minimal a correlation transform

should be performed (equalize the variance)

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

25

Example 2: Weighted Local Average — Parameter Summary Parameter Value weightFunction Biweight nNeighbors 1

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

28

Weighted Local Averaging ˆ g(x) = k

i=1 b

  • x − xc(i)
  • yc(i)

k

i=1 b

  • x − xc(i)
  • =

k

i=1 bi yc(i)

k

i=1 bi

ASEb(x) = 1 n

n

  • i=1

b (|x − xi|) (yi − ˆ g(x))2

  • Same as in the univariate case, except for the distance metric
  • x − xc(i) denotes the Euclidean distance to the ith nearest

neighbor

  • Smoothing parameter: k, the number of neighbors
  • Prediction error estimation: leave-one-out cross-validation, O(n)
  • Other parameters

– Weighting function

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

26

slide-8
SLIDE 8

Example 2: Weighted Local Average — Bias

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) − g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2) − g(x1, x2)

Normalized Average Squared Bias: 0.54%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

31

Example 2: Weighted Local Average — Example

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

Normalized Average Squared Error: 23.80%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

29

Example 2: Weighted Local Average — Standard Deviation

−1 1 −1 1 0.5 1 1.5 2 x2 x1

  • E[(g(x1, x2) − ¯

g(x1, x2))2] −1 1 −1.5 −1 −0.5 0.5 1 1.5 0.5 1 1.5 2 x1 x2

  • E[(g(x1, x2) − ¯

g(x1, x2))2]

Normalized Variance: 14.11%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

32

Example 2: Weighted Local Average — Average

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2)] −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)]

Normalized Average Squared Error: 0.54%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

30

slide-9
SLIDE 9

Example 2: Weighted Local Average — Average

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2)] −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)]

Normalized Average Squared Error: 1.64%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

35

Example 2: Weighted Local Average — Parameter Summary Parameter Value weightFunction Biweight nNeighbors 10

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

33

Example 2: Weighted Local Average — Bias

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) − g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2) − g(x1, x2)

Normalized Average Squared Bias: 1.64%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

36

Example 2: Weighted Local Average — Example

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

Normalized Average Squared Error: 8.54%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

34

slide-10
SLIDE 10

Example 2: Weighted Local Average — Example

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

Normalized Average Squared Error: 17.40%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

39

Example 2: Weighted Local Average — Standard Deviation

−1 1 −1 1 0.5 1 1.5 2 x2 x1

  • E[(g(x1, x2) − ¯

g(x1, x2))2] −1 1 −1.5 −1 −0.5 0.5 1 1.5 0.5 1 1.5 2 x1 x2

  • E[(g(x1, x2) − ¯

g(x1, x2))2]

Normalized Variance: 3.35%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

37

Example 2: Weighted Local Average — Average

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2)] −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)]

Normalized Average Squared Error: 10.11%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

40

Example 2: Weighted Local Average — Parameter Summary Parameter Value weightFunction Biweight nNeighbors 50

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

38

slide-11
SLIDE 11

Example 2: MATLAB Code

function [yTest] = WeightedAverage (xBuild ,yBuild ,xTest , varargin ); % ============================================================================== % Process Function Arguments % ============================================================================== nMandatoryArguments = 3; nNeighbors = 5; weightFunction = ’Biweight ’; if nargin > nMandatoryArguments if ¬isstruct(varargin {1}) Parameters = struct; for c1 =1:2: length(varargin )-1 if ¬ischar(varargin{c1}), error ([ ’Error parsing arguments: Expected property name string at arg Parameters = setfield( Parameters , varargin{c1},varargin{c1 +1}); end else Parameters = varargin {1}; end parameterNames = fieldnames (Parameters ); for c1 = 1: length( parameterNames) parameterName = parameterNames{c1}; parameterValue = Parameters.( parameterName ); switch lower( parameterName ) case lower(’nNeighbors ’), nNeighbors = parameterValue; case lower(’WeightFunction ’), weightFunction = parameterValue;

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

43

Example 2: Weighted Local Average — Bias

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) − g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2) − g(x1, x2)

Normalized Average Squared Bias: 10.11%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

41

  • therwise

error ([’Unrecognized property: ’’’ varargin{c1} ’’’’]); end end end % ============================================================================== % Preprocessing % ============================================================================== nBuild = size(xBuild ,1); nTest = size(xTest ,1); nInputs = size(xTest ,2); bCrossValidation = false; if nBuild == nTest && max(max(abs(xBuild -xTest )))==0 , bCrossValidation = true; end; yTest = zeros(length(xTest ) ,1); % ============================================================================== % Main Loop % ============================================================================== for c1 = 1: nTest distances = zeros(nBuild ,1); for c2 = 1: nInputs distances = distances + (xBuild (:,c2)-xTest(c1 ,c2)).^2; % Listed as distances , but is actually squa end [distancesSorted ,iSort] = sort(distances ,’ascend ’); if ¬bCrossValidation , iNeighbors = iSort (1: nNeighbors +1); else iNeighbors = iSort (2: nNeighbors +2); end; distancesNeighbors = distances(iNeighbors (1: nNeighbors +1));

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

44

Example 2: Weighted Local Average — Standard Deviation

−1 1 −1 1 0.5 1 1.5 2 x2 x1

  • E[(g(x1, x2) − ¯

g(x1, x2))2] −1 1 −1.5 −1 −0.5 0.5 1 1.5 0.5 1 1.5 2 x1 x2

  • E[(g(x1, x2) − ¯

g(x1, x2))2]

Normalized Variance: 1.36%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

42

slide-12
SLIDE 12

Other Distance Measures

  • Of course, other measures of distance could be used.

di

  • p
  • j=1

wj (xj − xi,j)2 di

  • p
  • j=1

p

  • k=1

wjk (xj − xi,j) (xk − xi,k) di

p

  • j=1

|xj − xi,j| di max

j∈[1,p] |xj − xi,j|

  • How to pick the best distance?
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

47

yNeighbors = yBuild( iNeighbors (1: nNeighbors )); uSquared = distancesNeighbors (1: nNeighbors )/ distancesNeighbors( nNeighbors +1); switch lower( weightFunction), case lower(’Gaussian ’) weights = exp(-( uSquared -min(uSquared ))/2); case lower(’Epanechnikov ’) weights = (1- uSquared ); case lower(’Exponential ’) u = sqrt( uSquared ); weights = exp(-(u-min(u))); case lower(’Linear ’) u = sqrt( uSquared ); weights = (1-u).*( abs(u)<1) + 0 .000001*exp(-(uSquared -min(uSquared ))/2); case lower(’Squared ’) weights = 1./(5+ uSquared).^4; case lower(’Biweight ’) weights = (1- uSquared).^2;

  • therwise

error(’Invalid kernel

  • type. ’);

end yTest(c1) = sum(weights.*yNeighbors )/ sum(weights ); end

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

45

Weighted Local Linear Models Bii = b (x − xi) ˆ g(x) = xT(ATBA)−1ABy ASEb(x) = 1 n

n

  • i=1

bk (|x − xi|) [yi − ˆ g(x)]2

  • Same as in the univariate case, except for the distance metric
  • x − xc(i) denotes the Euclidean distance to the ith nearest

neighbor

  • Smoothing parameter: k, the number of neighbors
  • Prediction error estimation: leave-one-out cross-validation, O(n)
  • Other parameters

– Weighting function (parameters) – Regularization (parameters)

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

48

Weighted Local Averaging Discussion

  • Has almost exactly the same properties as the univariate case

– Optimal in that it minimizes the (distance-) weighted average squared error – Never exceeds the range of the data set – As k → ∞, ˆ g(x) → ¯ y – As k → 1, ˆ g(x) = y(i) where y(i) is the nearest neighbor to the input vector

  • In high dimensions, fails because the “nearest neighbor” or “local

neighborhood” is no longer well defined

  • Works well in low dimensions
  • Speed may be an issue, fast nearest neighbor algorithm will help
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

46

slide-13
SLIDE 13

Example 3: Weighted Local Linear — Parameter Summary Parameter Value weightFunction Biweight nNeighbors 3

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

51

Local Linear Models Ease of Use Bii = b (x − xi) ˆ g(x) = xT(ATBA)−1ABy ASEb(x) = 1 n

n

  • i=1

bk (|x − xi|) [yi − ˆ g(x)]2

  • To apply, the user must specify

– Weighting function (i.e., Gaussian) – Distance metric (i.e., Euclidean) – Number of neighbors (k) – Estimator for the coefficients wi – Regularization parameter for the coefficients

  • The most critical parameters are k and the regularization

parameter

  • Both affect the smoothness (bias-variance tradeoff)
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

49

Example 3: Weighted Local Linear — Example

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

Normalized Average Squared Error: 7178.58%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

52

Example 3: Weighted Local Linear n=250

5 10 15 20 25 30 35 40 45 50 0.2 0.4 0.6 0.8 1

  • No. Neighbors

Estimated Prediction NMSE Test Set Estimated

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

50

slide-14
SLIDE 14

Example 3: Weighted Local Linear — Standard Deviation

−1 1 −1 1 0.5 1 1.5 2 x2 x1

  • E[(g(x1, x2) − ¯

g(x1, x2))2] −1 1 −1.5 −1 −0.5 0.5 1 1.5 0.5 1 1.5 2 x1 x2

  • E[(g(x1, x2) − ¯

g(x1, x2))2]

Normalized Variance: 4560030.59%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

55

Example 3: Weighted Local Linear — Average

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2)] −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)]

Normalized Average Squared Error: 16952.70%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

53

Example 3: Weighted Local Linear — Parameter Summary Parameter Value weightFunction Biweight nNeighbors 14

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

56

Example 3: Weighted Local Linear — Bias

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) − g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2) − g(x1, x2)

Normalized Average Squared Bias: 16952.70%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

54

slide-15
SLIDE 15

Example 3: Weighted Local Linear — Bias

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) − g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2) − g(x1, x2)

Normalized Average Squared Bias: 1.32%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

59

Example 3: Weighted Local Linear — Example

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

Normalized Average Squared Error: 17.48%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

57

Example 3: Weighted Local Linear — Standard Deviation

−1 1 −1 1 0.5 1 1.5 2 x2 x1

  • E[(g(x1, x2) − ¯

g(x1, x2))2] −1 1 −1.5 −1 −0.5 0.5 1 1.5 0.5 1 1.5 2 x1 x2

  • E[(g(x1, x2) − ¯

g(x1, x2))2]

Normalized Variance: 7.71%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

60

Example 3: Weighted Local Linear — Average

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2)] −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)]

Normalized Average Squared Error: 1.32%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

58

slide-16
SLIDE 16

Example 3: Weighted Local Linear — Average

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2)] −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)]

Normalized Average Squared Error: 5.47%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

63

Example 3: Weighted Local Linear — Parameter Summary Parameter Value weightFunction Biweight nNeighbors 50

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

61

Example 3: Weighted Local Linear — Bias

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) − g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2) − g(x1, x2)

Normalized Average Squared Bias: 5.47%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

64

Example 3: Weighted Local Linear — Example

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

Normalized Average Squared Error: 6.67%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

62

slide-17
SLIDE 17
  • therwise

error ([’Unrecognized property: ’’’ varargin{c1} ’’’’]); end end end % ============================================================================== % Preprocessing % ============================================================================== nBuild = size(xBuild ,1); nTest = size(xTest ,1); nInputs = size(xTest ,2); bCrossValidation = false; if nBuild == nTest && max(max(abs(xBuild -xTest )))==0 , bCrossValidation = true; end; yTest = zeros(length(xTest ) ,1); % ============================================================================== % Main Loop % ============================================================================== for c1 = 1: nTest distances = zeros(nBuild ,1); for c2 = 1: nInputs distances = distances + (xBuild (:,c2)-xTest(c1 ,c2)).^2; end [distancesSorted ,iSort] = sort(distances ,’ascend ’); if ¬bCrossValidation , iNeighbors = iSort (1: nNeighbors +1); else iNeighbors = iSort (2: nNeighbors +2); end; distancesNeighbors = distances(iNeighbors (1: nNeighbors +1));

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

67

Example 3: Weighted Local Linear — Standard Deviation

−1 1 −1 1 0.5 1 1.5 2 x2 x1

  • E[(g(x1, x2) − ¯

g(x1, x2))2] −1 1 −1.5 −1 −0.5 0.5 1 1.5 0.5 1 1.5 2 x1 x2

  • E[(g(x1, x2) − ¯

g(x1, x2))2]

Normalized Variance: 3.34%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

65

xNeighbors = xBuild( iNeighbors (1: nNeighbors ) ,:); yNeighbors = yBuild( iNeighbors (1: nNeighbors )); uSquared = distancesNeighbors (1: nNeighbors )/ distancesNeighbors( nNeighbors +1); switch lower( weightFunction), case lower(’Gaussian ’) weights = exp(-(uSquared -min(uSquared ))/2); case lower(’Epanechnikov ’) weights = (1- uSquared ); case lower(’Exponential ’) u = sqrt( uSquared ); weights = exp(-(u-min(u))); case lower(’Linear ’) u = sqrt( uSquared ); weights = (1-u).*( abs(u)<1) + 0.000001 *exp(-(uSquared -min(uSquared ))/2); case lower(’Squared ’) weights = 1./(5+ uSquared).^4; case lower(’Biweight ’) weights = (1- uSquared).^2;

  • therwise

error(’Invalid kernel

  • type. ’);

end weights = sqrt(weights ); % Weighting function (square rooted) Aw = diag(weights )*[ ones(nNeighbors ,1) xNeighbors ]; yw = diag(weights )* yNeighbors; yTest(c1) = [1 xTest(c1 ,:)]* pinv(Aw)*yw; % Performs PCR with a very low threshold end

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

68

Example 3: MATLAB Code

function [yTest] = WeightedAverage (xBuild ,yBuild ,xTest , varargin ); % ============================================================================== % Process Function Arguments % ============================================================================== nMandatoryArguments = 3; nNeighbors = 5; weightFunction = ’Biweight ’; if nargin > nMandatoryArguments if ¬isstruct(varargin {1}) Parameters = struct; for c1 =1:2: length(varargin )-1 if ¬ischar(varargin{c1}), error ([ ’Error parsing arguments: Expected property name string at arg Parameters = setfield( Parameters , varargin{c1},varargin{c1 +1}); end else Parameters = varargin {1}; end parameterNames = fieldnames (Parameters ); for c1 = 1: length( parameterNames) parameterName = parameterNames{c1}; parameterValue = Parameters.( parameterName ); switch lower( parameterName ) case lower(’nNeighbors ’), nNeighbors = parameterValue; case lower(’WeightFunction ’), weightFunction = parameterValue;

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

66

slide-18
SLIDE 18

Example 4: Kernel Smoother n=250

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1 Kernel Width Estimated Prediction NMSE Test Set Estimated

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

71

Weighted Local Linear Model Discussion

  • Can exceed the range of the data set
  • Some form of regularization sorely needed
  • Optimal in that it minimizes the (distance-) weighted average

squared error – As k → ∞, ˆ g(x) → linear least squares – As k → 3, ˆ g(x) → piecewise linear interpolant (discontinuous unless k > 3)

  • In high dimensions, fails because the “nearest neighbor” or “local

neighborhood” is no longer well defined

  • Can work well if suitable regularization is used
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

69

Example 4: Kernel Smoother — Parameter Summary Parameter Value kernelType Gaussian KernelWidth 0.010

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

72

Kernel Smoothing ˆ g(x) = k

i=1 b (x − xi) yi

k

i=1 b (x − xi)

= k

i=1 bi yi

k

i=1 bi

ASEb(x) = 1 n

n

  • i=1

b (|x − xi|) (yi − ˆ g(x))2

  • Same as in the univariate case, except for the distance metric
  • x − xc(i) denotes the Euclidean distance to the ith nearest

neighbor

  • Smoothing parameter: σ, width of the kernel functions
  • Prediction error estimation: leave-one-out cross-validation, O(n)
  • Other parameters

– Weighting function

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

70

slide-19
SLIDE 19

Example 4: Kernel Smoother — Bias

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) − g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2) − g(x1, x2)

Normalized Average Squared Bias: 0.54%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

75

Example 4: Kernel Smoother — Example

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

Normalized Average Squared Error: 23.61%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

73

Example 4: Kernel Smoother — Standard Deviation

−1 1 −1 1 0.5 1 1.5 2 x2 x1

  • E[(g(x1, x2) − ¯

g(x1, x2))2] −1 1 −1.5 −1 −0.5 0.5 1 1.5 0.5 1 1.5 2 x1 x2

  • E[(g(x1, x2) − ¯

g(x1, x2))2]

Normalized Variance: 13.90%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

76

Example 4: Kernel Smoother — Average

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2)] −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)]

Normalized Average Squared Error: 0.54%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

74

slide-20
SLIDE 20

Example 4: Kernel Smoother — Average

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2)] −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)]

Normalized Average Squared Error: 0.96%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

79

Example 4: Kernel Smoother — Parameter Summary Parameter Value kernelType Gaussian KernelWidth 0.100

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

77

Example 4: Kernel Smoother — Bias

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) − g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2) − g(x1, x2)

Normalized Average Squared Bias: 0.96%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

80

Example 4: Kernel Smoother — Example

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

Normalized Average Squared Error: 15.92%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

78

slide-21
SLIDE 21

Example 4: Kernel Smoother — Example

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

Normalized Average Squared Error: 70.50%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

83

Example 4: Kernel Smoother — Standard Deviation

−1 1 −1 1 0.5 1 1.5 2 x2 x1

  • E[(g(x1, x2) − ¯

g(x1, x2))2] −1 1 −1.5 −1 −0.5 0.5 1 1.5 0.5 1 1.5 2 x1 x2

  • E[(g(x1, x2) − ¯

g(x1, x2))2]

Normalized Variance: 7.22%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

81

Example 4: Kernel Smoother — Average

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2)] −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)]

Normalized Average Squared Error: 68.17%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

84

Example 4: Kernel Smoother — Parameter Summary Parameter Value kernelType Gaussian KernelWidth 1

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

82

slide-22
SLIDE 22

Example 4: MATLAB Code

function [yTest] = KernelSmooth(xBuild ,yBuild ,xTest , varargin) % ============================================================================== % Process Function Arguments % ============================================================================== nMandatoryArguments = 3; kernelFunction = ’Gaussian ’; kernelWidth = std( xBuild (: ,1))/10; if nargin > nMandatoryArguments if ¬isstruct(varargin {1}) Parameters = struct; for c1 =1:2: length(varargin )-1 if ¬ischar(varargin{c1}), error ([ ’Error parsing arguments: Expected property name string at arg Parameters = setfield( Parameters , varargin{c1},varargin{c1 +1}); end else Parameters = varargin {1}; end parameterNames = fieldnames (Parameters ); for c1 = 1: length( parameterNames) parameterName = parameterNames{c1}; parameterValue = Parameters.( parameterName ); switch lower( parameterName ) case lower(’KernelType ’), kernelFunction = parameterValue; case lower(’KernelWidth ’), kernelWidth = parameterValue;

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

87

Example 4: Kernel Smoother — Bias

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) − g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2) − g(x1, x2)

Normalized Average Squared Bias: 68.17%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

85

  • therwise

error ([’Unrecognized property: ’’’ varargin{c1} ’’’’]); end end end % ============================================================================== % Preprocessing % ============================================================================== nBuild = size(xBuild ,1); nTest = size(xTest ,1); nInputs = size(xTest ,2); bCrossValidation = false; if nBuild == nTest && max(max(abs(xBuild -xTest )))==0 , bCrossValidation = true; end; yTest = zeros(length(xTest ) ,1); kernelWidthSquared = kernelWidth ^2; % ============================================================================== % Main Loop % ============================================================================== for c1 = 1: nTest distancesSquared = zeros(nBuild ,1); for c2 = 1: nInputs distancesSquared = distancesSquared + (xBuild (:,c2)-xTest(c1 ,c2)).^2; end if ¬bCrossValidation k = 1: nBuild; else

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

88

Example 4: Kernel Smoother — Standard Deviation

−1 1 −1 1 0.5 1 1.5 2 x2 x1

  • E[(g(x1, x2) − ¯

g(x1, x2))2] −1 1 −1.5 −1 −0.5 0.5 1 1.5 0.5 1 1.5 2 x1 x2

  • E[(g(x1, x2) − ¯

g(x1, x2))2]

Normalized Variance: 0.44%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

86

slide-23
SLIDE 23

Distance Metrics di =

  • p
  • j=1

(xj − xi,j)2

  • On real data sets these methods can perform poorly
  • Note that useless inputs affect the distance measure just as much

as useful inputs

  • These methods work much better if a more flexible metric is used
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

91

[distanceMin ,iMin] = min( distancesSquared ); k = [1: iMin -1 iMin +1: nBuild ]; end uSquared = distancesSquared(k)/ kernelWidthSquared; switch lower( kernelFunction), case lower(’Gaussian ’) bases = exp(-(uSquared -min(uSquared ))/2); case lower(’Epanechnikov ’) bases = (1- uSquared).*( abs(uSquared ) <1); case lower(’Exponential ’) u = sqrt( uSquared ); bases = exp(-(u-min(u))); case lower(’Linear ’) u = sqrt( uSquared ); bases = (1-u).*( abs(u)<1) + 0 .000001*exp(-(uSquared -min(uSquared ))/2); case lower(’Squared ’) bases = 1./(5+ uSquared).^4; case lower(’Biweight ’) bases = (1- uSquared).^2.*(abs(uSquared ) <1);

  • therwise

error(’Invalid kernel

  • type. ’);

end sumBases = sum(bases ); if sumBases >0 yTest(c1) = (bases ’* yBuild(k))/ sumBases; else error(’Bases summed to zero. ’); end end

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

89

Distance Metrics Continued

  • The most practical generalization of the Euclidean distance is the

weighted Euclidean distance di =

  • p
  • j=1

λ2

i (xj − xi,j)2

where the weights λi control how much influence the ith input has on the model

  • But how do we pick the coefficients λ?
  • Ideally we would like to pick the coefficients that minimize the

prediction error

  • Prediction error is not knowable, but we can approximate it
  • How do we optimize p parameters to minimize the prediction error

estimate?

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

92

Kernel Smoothing Discussion

  • Has almost exactly the same properties as the univariate case

– Optimal in that it minimizes the (distance-) weighted average squared error – Never exceeds the range of the data set – As σ → ∞, ˆ g(x) → ¯ y – As σ → 0, ˆ g(x) = y(i) where y(i) is the nearest neighbor to the input vector

  • In high dimensions, fails because neighbors and distance are no

longer well defined

  • Works well in low dimensions
  • Speed may be an issue
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

90

slide-24
SLIDE 24

Radial Basis Functions ˆ g(x) = w0 +

n

  • i=1

wi b x − xi σ

  • Under certain conditions[3, Chapter 5], they achieve the global

minimum of the optimization for certain roughness penalties (s [ˆ g(x)])

  • Basic idea: approximate functions as a linear combination of

bumps

  • Originally used in mathematics for function approximation (think

interpolation)

  • Could use a different measure than the 2-norm (Euclidean

distance)

  • σ controls the width of bumps
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

95

Roughness Penalty Approach ˆ g(x) = argminˆ

g(x) n

  • i=1

(yi − ˆ g(xi))2 + λs [ˆ g(x)]

  • The roughness penalty approach specifies an optimization criterion

that consists of two elements – The first controls how well the model fits the data – The second controls how “rough” the model is

  • The regularization parameter λ then controls the tradeoff

between these two goals

  • The user defines the roughness term s [ˆ

g(x)] – s [ˆ g(x)] is large for rough, complex, wiggly functions – s [ˆ g(x)] is small for smooth, simple, slowly varying functions

  • The roughness penalty is a means of imposing structure
  • The estimator must be found across the infinite set of all possible

functions!

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

93

Radial Basis Functions: Solving for the Coefficients ˆ g(x) = w0 +

n

  • i=1

wi b x − xi σ

  • The coefficients wi are typically estimated through linear least

squares – Permits us to use all of the power of linear least squares for a nonlinear model – Linear in the coefficients (but not σ)

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

96

Roughness Penalty Approach ˆ g(x) = argminˆ

g(x) n

  • i=1

(yi − ˆ g(xi))2 + λs [ˆ g(x)]

  • Surprisingly, for many penalty functions s [ˆ

g(x)] the solution can be solved and is numerically tractable

  • An elegant approach
  • Resulting functions are optimal in that they minimize the

penalized error function

  • Many types of nonlinear modeling can be derived from this

framework including – Radial basis functions – Thin plate splines

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

94

slide-25
SLIDE 25

Basis Functions ˆ g(x) = w0 +

n

  • i=1

wi b x − xi σ

  • Unlike kernel methods and local models, basis functions have

fewer restrictions – Need not have finite area – Need not be non-negative – Need not be “local”

  • A popular form of neural networks use sigmoidal basis functions

that have “global features” that span entire sections of the input space

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

99

Gaussian Roughness Penalty The Gaussian is the minimizer when the penalty function is s [ˆ g(x)] =

  • x∈Rp

  • n=0
  • σ2n

i

n!2n ∂ ∂x1 + ∂ ∂x2 + · · · + ∂ ∂xp n ˆ g(x) 2 dx

  • Not clear to me why you would ever use this
  • Non-intuitive
  • No physical interpretation
  • Widely used, nonetheless
  • Departure from theory to practice
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

97

Radial Basis Functions: Collinearity ˆ g(x) = w0 +

n

  • i=1

wi b x − xi σ

  • In practice two problems can arise if a basis function is assigned to

each input of the data set

  • First problem:

– Input points that are close to one another will have similar basis functions – Causes collinearity and ill-conditioning of the data matrix

  • Second problem:

– Solving for the coefficients by least squares requires O(n3)

  • perations
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

100

Basis Functions ˆ g(x) = w0 +

n

  • i=1

wi b x − xi σ

  • In practice, people use bumps because

– This gives RBS some of the nice properties of local models – They are easy to code (especially in MATLAB) – They work well in low dimensions

  • The usual selection of basis functions is possible

– Multiquadrics: bσ(u) = (d2 + c2)1/2, where c > 0 – Inverse Multiquadric: bσ(u) =

1 (d2+c2)1/2 , where c > 0

– Gaussians (most popular): bσ(u) = e− u2

2

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

98

slide-26
SLIDE 26
  • The resulting estimate ˆ

g(x) is sensitive to all of these decisions

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

103

Radial Basis Functions: Clustering ˆ g(x) = w0 +

nb

  • i=1

wi b x − ci σ

  • Both of these problems can be solved by selecting basis function

centers that are – Well separated from one another – Smaller in number (nb) than the than n

  • But how do we this in practice?
  • Apply a clustering algorithm

– Randomly select the centers – Max-min clustering – Some other form of clustering (e.g., k means)

  • Requires additional user input and decisions
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

101

Radial Basis Functions Summary ˆ g(x) = w0 +

n

  • i=1

wi b x − xi σ

  • x − xi denotes the Euclidean distance to the ith point
  • Smoothing parameter: σ, width of the basis functions
  • Prediction error estimation: PRESS estimated through hat matrix,

O(n2)

  • Other parameters

– Clustering algorithm and parameters, if not all points used – Basis function – Regularization parameters

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

104

Radial Basis Functions are Unwieldy ˆ g(x) = w0 +

n

  • i=1

wi b x − xi σ

  • Practical application is rarely as elegant as the underlying theory
  • Theory suggests that the user only needs to select (or estimate)

– Roughness penalty (s [ˆ g(x)]) – Regularization parameter λ

  • In practice, the user instead selects

– Basis function (i.e., Gaussian) – Basis widths (σ) – Distance metric (i.e., Euclidean) – Estimator for the coefficients wi – Regularization parameter λ – Clustering algorithm and parameters (i.e., number of clusters)

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

102

slide-27
SLIDE 27

Example 4: Radial Basis Function — Average

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2)] −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)]

Normalized Average Squared Error: 6.20%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

107

Example 4: Radial Basis Function — Parameter Summary Parameter Value RidgeParameter 0.001 Renormalize false nClusters 50 BasisWidth 0.300

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

105

Example 4: Radial Basis Function — Bias

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) − g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2) − g(x1, x2)

Normalized Average Squared Bias: 6.20%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

108

Example 4: Radial Basis Function — Example

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

Normalized Average Squared Error: 79.96%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

106

slide-28
SLIDE 28

switch lower( parameterName ) case lower(’BasisType ’), basisType = parameterValue; case lower(’BasisWidth ’), basisWidth = parameterValue; case lower(’RidgeParameter ’), ridgeParameter = parameterValue; case lower(’Renormalize ’), renormalize = parameterValue; case lower(’nClusters ’), nClusters = parameterValue;

  • therwise

error ([’Unrecognized property: ’’’ varargin{c1} ’’’’]); end end end % ============================================================================== % Preprocessing % ============================================================================== nBuild = size(xBuild ,1); nTest = size(xTest ,1); nInputs = size(xTest ,2); bCrossValidation = false; if nBuild == nTest && max(max(abs(xBuild -xTest )))==0 , bCrossValidation = true; end; yTest = zeros(length(xTest ) ,1); % ============================================================================== % Clustering % ============================================================================== if nClusters == nBuild clusters = xBuild; else clusters = ClusterMaxMin (xBuild ,nClusters ); end

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

111

Example 4: Radial Basis Function — Standard Deviation

−1 1 −1 1 0.5 1 1.5 2 x2 x1

  • E[(g(x1, x2) − ¯

g(x1, x2))2] −1 1 −1.5 −1 −0.5 0.5 1 1.5 0.5 1 1.5 2 x1 x2

  • E[(g(x1, x2) − ¯

g(x1, x2))2]

Normalized Variance: 16.78%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

109

% ============================================================================== % Estimate the Coefficients % ============================================================================== distances = zeros(nBuild ,nClusters ); for c1=1: nBuild for c2 =1: nClusters distance = 0; for c3 =1: nInputs distance = sum (( xBuild(c1 ,:) - clusters(c2 ,:)).^2); end distances(c1 ,c2) = distance; end end B = exp(-distances /(2* basisWidth. ^2)); if renormalize for c1 =1: nBuild B(c1 ,:) = B(c1 ,:)/ sum(B(c1 ,:)); end end B = [ones(nBuild ,1) B]; P = eye(nClusters +1); P(1 ,1) = 0; % Do not penalize bias weight Bi = inv(B’*B + ridgeParameter*P)*B’; weights = Bi*yBuild; % ============================================================================== % Main Loop % ============================================================================== if bCrossValidation hat = B*Bi;

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

112

Example 4: MATLAB Code

function [yTest] = RadialBasis(xBuild ,yBuild ,xTest , varargin ); % ============================================================================== % Process Function Arguments % ============================================================================== nMandatoryArguments = 3; basisType = ’Gaussian ’; ridgeParameter = 0.1; basisWidth = 0.25; renormalize = false; nClusters = size(xBuild ,1); if nargin > nMandatoryArguments if ¬isstruct(varargin {1}) Parameters = struct; for c1 =1:2: length(varargin )-1 if ¬ischar(varargin{c1}), error ([ ’Error parsing arguments: Expected property name string at arg Parameters = setfield( Parameters , varargin{c1},varargin{c1 +1}); end else Parameters = varargin {1}; end parameterNames = fieldnames (Parameters ); for c1 = 1: length( parameterNames) parameterName = parameterNames{c1}; parameterValue = Parameters.( parameterName );

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

110

slide-29
SLIDE 29

Example 5: Basis Functions

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1 Input (scaled) Basis Functions (scaled)

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

115

yHat = hat*yBuild; for c1 =1: nTest yTest(c1) = yBuild(c1) - (yBuild(c1)-yHat(c1))./(1- hat(c1 ,c1 )); end else for c1 = 1: nTest distances = zeros(nClusters ,1); for c2 = 1: nInputs distances = distances + (clusters (:,c2)-xTest(c1 ,c2)).^2; end bases = exp(- distances /(2* basisWidth. ^2)); if renormalize , bases = bases/sum(bases ); end; bases = [1; bases ]; yTest(c1) = weights ’* bases; end end

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

113

Example 6: Renormalized Basis Functions

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1 Input (scaled) Basis Functions (scaled)

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

116

Radial Basis Functions Rescaled bn(x) = b

  • x−ci

σ

  • nb

i=1 b

  • x−ci

σ

  • One limitation of radial basis functions is that they may produce

holes in between the basis functions[1, p.187]

  • This can be solved by normalizing the basis functions such that

they always sum to unity

  • This makes them more similar to kernel functions
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

114

slide-30
SLIDE 30

Example 6: RBF (Renormalized) — Average

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2)] −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)]

Normalized Average Squared Error: 1.08%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

119

Example 6: RBF (Renormalized) — Parameter Summary Parameter Value RidgeParameter 0.001 Renormalize true nClusters 50 BasisWidth 0.300

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

117

Example 6: RBF (Renormalized) — Bias

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) − g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2) − g(x1, x2)

Normalized Average Squared Bias: 1.08%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

120

Example 6: RBF (Renormalized) — Example

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

Normalized Average Squared Error: 23.37%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

118

slide-31
SLIDE 31

switch lower( parameterName ) case lower(’BasisType ’), basisType = parameterValue; case lower(’BasisWidth ’), basisWidth = parameterValue; case lower(’RidgeParameter ’), ridgeParameter = parameterValue; case lower(’Renormalize ’), renormalize = parameterValue; case lower(’nClusters ’), nClusters = parameterValue;

  • therwise

error ([’Unrecognized property: ’’’ varargin{c1} ’’’’]); end end end % ============================================================================== % Preprocessing % ============================================================================== nBuild = size(xBuild ,1); nTest = size(xTest ,1); nInputs = size(xTest ,2); bCrossValidation = false; if nBuild == nTest && max(max(abs(xBuild -xTest )))==0 , bCrossValidation = true; end; yTest = zeros(length(xTest ) ,1); % ============================================================================== % Clustering % ============================================================================== if nClusters == nBuild clusters = xBuild; else clusters = ClusterMaxMin (xBuild ,nClusters ); end

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

123

Example 6: RBF (Renormalized) — Standard Deviation

−1 1 −1 1 0.5 1 1.5 2 x2 x1

  • E[(g(x1, x2) − ¯

g(x1, x2))2] −1 1 −1.5 −1 −0.5 0.5 1 1.5 0.5 1 1.5 2 x1 x2

  • E[(g(x1, x2) − ¯

g(x1, x2))2]

Normalized Variance: 9.99%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

121

% ============================================================================== % Estimate the Coefficients % ============================================================================== distances = zeros(nBuild ,nClusters ); for c1=1: nBuild for c2 =1: nClusters distance = 0; for c3 =1: nInputs distance = sum (( xBuild(c1 ,:) - clusters(c2 ,:)).^2); end distances(c1 ,c2) = distance; end end B = exp(-distances /(2* basisWidth. ^2)); if renormalize for c1 =1: nBuild B(c1 ,:) = B(c1 ,:)/ sum(B(c1 ,:)); end end B = [ones(nBuild ,1) B]; P = eye(nClusters +1); P(1 ,1) = 0; % Do not penalize bias weight Bi = inv(B’*B + ridgeParameter*P)*B’; weights = Bi*yBuild; % ============================================================================== % Main Loop % ============================================================================== if bCrossValidation hat = B*Bi;

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

124

Example 6: MATLAB Code

function [yTest] = RadialBasis(xBuild ,yBuild ,xTest , varargin ); % ============================================================================== % Process Function Arguments % ============================================================================== nMandatoryArguments = 3; basisType = ’Gaussian ’; ridgeParameter = 0.1; basisWidth = 0.25; renormalize = false; nClusters = size(xBuild ,1); if nargin > nMandatoryArguments if ¬isstruct(varargin {1}) Parameters = struct; for c1 =1:2: length(varargin )-1 if ¬ischar(varargin{c1}), error ([ ’Error parsing arguments: Expected property name string at arg Parameters = setfield( Parameters , varargin{c1},varargin{c1 +1}); end else Parameters = varargin {1}; end parameterNames = fieldnames (Parameters ); for c1 = 1: length( parameterNames) parameterName = parameterNames{c1}; parameterValue = Parameters.( parameterName );

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

122

slide-32
SLIDE 32

Thin Plate Splines s [ˆ g(x)] =

  • R2

∂2ˆ g(x) ∂x2

1

2 + ∂2ˆ g(x) ∂x1∂x2 2 + ∂2ˆ g(x) ∂x1∂x2 2 dx

  • Thin plate splines also use a roughness penalty approach
  • In two dimensions it can be written as above
  • The optimal solution across all functions is a thin plate spline
  • Two-dimensional splines literally approximate a rubber sheet fit

that minimizes the bending energy over the surface

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

127

yHat = hat*yBuild; for c1 =1: nTest yTest(c1) = yBuild(c1) - (yBuild(c1)-yHat(c1))./(1- hat(c1 ,c1 )); end else for c1 = 1: nTest distances = zeros(nClusters ,1); for c2 = 1: nInputs distances = distances + (clusters (:,c2)-xTest(c1 ,c2)).^2; end bases = exp(- distances /(2* basisWidth. ^2)); if renormalize , bases = bases/sum(bases ); end; bases = [1; bases ]; yTest(c1) = weights ’* bases; end end

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

125

Thin Plate Splines Summary ˆ g(x) =

n

  • i=1

wib(x − xi) +

p

  • j=1

ajφj(x)

  • Smoothing parameter: λ, penalty parameter
  • Prediction error estimation: leave-one-out cross-validation
  • Other parameters

– Clustering algorithm and parameters, if not all points used – Order

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

128

Radial Basis Functions Discussion

  • Has an optimality property

– As σ → ∞, ˆ g(x) → ¯ y – As σ → 0, ˆ g(x) = sharp bumps that interpolate

  • In high dimensions, fails because basis functions are no longer local
  • Works well in low dimensions
  • Estimated surface is “bumpy”
  • Regularization needed (multiple parameters to optimize)
  • No obvious best choice for estimation of prediction error or
  • ptimization of the smoothing parameter
  • Computation may be an issue
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

126

slide-33
SLIDE 33

Example 6: TPS (Cubic) — Average

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2)] −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)]

Normalized Average Squared Error: 2.64%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

131

Example 6: TPS (Cubic) — Parameter Summary Parameter Value

  • rder

3 penalty 0.000

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

129

Example 6: TPS (Cubic) — Bias

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) − g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2) − g(x1, x2)

Normalized Average Squared Bias: 2.64%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

132

Example 6: TPS (Cubic) — Example

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

Normalized Average Squared Error: 42.23%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

130

slide-34
SLIDE 34

Parameters = varargin {1}; end parameterNames = fieldnames (Parameters ); for c1 = 1: length( parameterNames) parameterName = parameterNames{c1}; parameterValue = Parameters.( parameterName ); switch lower( parameterName ) case lower(’Order ’),

  • rder

= parameterValue; case lower(’Penalty ’), penalty = parameterValue;

  • therwise

error ([’Unrecognized property: ’’’ varargin{c1} ’’’’]); end end end % ========================================================================== % Preprocessing % ========================================================================== nInputs = size(xBuild ,2); nTrain = size(xBuild ,1); nTest = size(xTest ,1); if 2* order≤nInputs error(’Polynomial degree must be greater than 0.5*d.\n’); end % ========================================================================== % Build Estimator % ========================================================================== if nInputs ==1, T = zeros(order ,nTrain );

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

135

Example 6: TPS (Cubic) — Standard Deviation

−1 1 −1 1 0.5 1 1.5 2 x2 x1

  • E[(g(x1, x2) − ¯

g(x1, x2))2] −1 1 −1.5 −1 −0.5 0.5 1 1.5 0.5 1 1.5 2 x1 x2

  • E[(g(x1, x2) − ¯

g(x1, x2))2]

Normalized Variance: 14.25%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

133

for c1 = 1: order T(c1 ,:) = (xBuild. ^(c1 -1)) ’; end E = zeros(nTrain ,nTrain ); for c1 = 1: nTrain E(c1 ,:) = TPBasis(abs(xBuild(c1)-xBuild),order ,nInputs )’; end A = [(E + penalty*eye(nTrain )) T’;T zeros(order ,order )]; b = [yBuild;zeros(order ,1)]; w = inv(A)*b; delt = w(1: nTrain ); a = w(nTrain + (1: order )); yTest = zeros(nTest ,1); p = (0: order -1) ’; for c1 = 1: nTest yTest(c1) = delt ’* TPBasis(abs(xTest(c1)-xBuild),order ,nInputs ); yTest(c1) = yTest(c1) + a’* xTest(c1).^p; end elseif nInputs ==2, x1 = xBuild (: ,1); x2 = xBuild (: ,2); M = nchoosek(order +1 ,2); T = zeros(M,nTrain ); cnt = 1; for c1 = 0:order -1, % Power Loop for c2 = 0:c1 , T(cnt ,:) = (x1. ^(c2).*x2. ^(c1 -c2))’; cnt = cnt + 1; end

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

136

Example 6: MATLAB Code

function [yTest] = ThinPlateSpline (xBuild ,yBuild ,xTest , varargin) % ============================================================= % ThinPlate: Thin Plate Spline for 1 or 2 dim inputs % % yTest = TPBasis(xBuild ,yBuild ,yTest ,order , penalty) returns the % output of a ThinPlate spline evaluated at xBuild , yBuild , % yTest using a polynomial with degree

  • rder -1 and

smoothing % penalty parameter penalty. % % See Green & Silverman , Page 160 for details. % ============================================================== % ========================================================================== % Process Function Arguments % ========================================================================== nMandatoryArguments = 3;

  • rder

= 2; penalty = std(yBuild ); if nargin > nMandatoryArguments if ¬isstruct(varargin {1}) Parameters = struct; for c1 =1:2: length(varargin )-1 if ¬ischar(varargin{c1}), error ([ ’Error parsing arguments: Expected property name string at Parameters = setfield( Parameters , varargin{c1},varargin{c1 +1}); end else

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

134

slide-35
SLIDE 35

Example 6: TPS (Quadratic) — Parameter Summary Parameter Value

  • rder

2 penalty 0.010

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

139

end E = zeros(nTrain ,nTrain ); for cnt = 1: nTrain E(cnt ,:) = TPBasis(sqrt ((x1 -x1(cnt )).^2 + (x2 -x2(cnt )).^2), order ,nInputs )’; end A = [(E + penalty*eye(nTrain )) T’;T zeros(M,M)]; b = [yBuild;zeros(M ,1)]; w = pinv(A)*b; delt = w(1: nTrain ); a = w(nTrain + (1:M)); yTest = zeros(nTest ,1); for cnt = 1: nTest xt1 = xTest(cnt ,1); xt2 = xTest(cnt ,2); y = delt ’* TPBasis(sqrt ((xt1 -x1). ^2+(xt2 -x2).^2), order ,nInputs ); ca = 1; for c1 = 0:order -1 % Power Loop for c2 = 0:c1 y = y + a(ca )*( xTest(cnt ,1).^(c2).*xTest(cnt ,2).^(c1 -c2 ))’; ca = ca + 1; end end yTest(cnt) = y; end else fprintf(’This function is not implemented for dimensions > 2.\n’); yTest = zeros(size(xTest )); end function [y] = TPBasis(x,order ,nInputs)

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

137

Example 6: TPS (Quadratic) — Example

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)

Normalized Average Squared Error: 5.19%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

140

% ============================================================= % TPBasis: Thin Plate Basis Function Output % % y = TPBasis(x,order ,d) returns the

  • utput of the

Thin Plate % spline basis function for input x, polynomial

  • f degree
  • rder ,

% and dimension d. % % See Green & Silverman , Page 160 for details. % ============================================================== if 2* order≤nInputs error(’Polynomial degree must be greater than 0.5*d.\n’); end; if rem(nInputs ,2)==0 % If nInputs even th = ( -1)^( order +1+ nInputs /2)*2^(1 -2* order )*pi^(- nInputs /2) ’; th = th*inv(factorial(order -1)* factorial(order - nInputs /2)); y = th*x. ^(2* order -nInputs ); y(x >0) = y(x >0).*log(x(x >0)); else th = gamma(nInputs /2- order )*2^( -2* order )*pi^(- nInputs /2)* inv(factorial(order -1)); y = th*x. ^(2* order -nInputs ); end end end

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

138

slide-36
SLIDE 36

Example 6: TPS (Quadratic) — Standard Deviation

−1 1 −1 1 0.5 1 1.5 2 x2 x1

  • E[(g(x1, x2) − ¯

g(x1, x2))2] −1 1 −1.5 −1 −0.5 0.5 1 1.5 0.5 1 1.5 2 x1 x2

  • E[(g(x1, x2) − ¯

g(x1, x2))2]

Normalized Variance: 3.58%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

143

Example 6: TPS (Quadratic) — Average

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2)] −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2)]

Normalized Average Squared Error: 1.85%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

141

Example 6: MATLAB Code

function [yTest] = ThinPlateSpline (xBuild ,yBuild ,xTest , varargin) % ============================================================= % ThinPlate: Thin Plate Spline for 1 or 2 dim inputs % % yTest = TPBasis(xBuild ,yBuild ,yTest ,order , penalty) returns the % output of a ThinPlate spline evaluated at xBuild , yBuild , % yTest using a polynomial with degree

  • rder -1 and

smoothing % penalty parameter penalty. % % See Green & Silverman , Page 160 for details. % ============================================================== % ========================================================================== % Process Function Arguments % ========================================================================== nMandatoryArguments = 3;

  • rder

= 2; penalty = std(yBuild ); if nargin > nMandatoryArguments if ¬isstruct(varargin {1}) Parameters = struct; for c1 =1:2: length(varargin )-1 if ¬ischar(varargin{c1}), error ([ ’Error parsing arguments: Expected property name string at Parameters = setfield( Parameters , varargin{c1},varargin{c1 +1}); end else

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

144

Example 6: TPS (Quadratic) — Bias

−1 1 −1 1 −1 −0.5 0.5 1 x2 x1 g(x1, x2) − g(x1, x2) −1 1 −1.5 −1 −0.5 0.5 1 1.5 −1 −0.5 0.5 1 x1 x2 g(x1, x2) − g(x1, x2)

Normalized Average Squared Bias: 1.85%

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

142

slide-37
SLIDE 37

end E = zeros(nTrain ,nTrain ); for cnt = 1: nTrain E(cnt ,:) = TPBasis(sqrt ((x1 -x1(cnt )).^2 + (x2 -x2(cnt )).^2), order ,nInputs )’; end A = [(E + penalty*eye(nTrain )) T’;T zeros(M,M)]; b = [yBuild;zeros(M ,1)]; w = pinv(A)*b; delt = w(1: nTrain ); a = w(nTrain + (1:M)); yTest = zeros(nTest ,1); for cnt = 1: nTest xt1 = xTest(cnt ,1); xt2 = xTest(cnt ,2); y = delt ’* TPBasis(sqrt ((xt1 -x1). ^2+(xt2 -x2).^2), order ,nInputs ); ca = 1; for c1 = 0:order -1 % Power Loop for c2 = 0:c1 y = y + a(ca )*( xTest(cnt ,1).^(c2).*xTest(cnt ,2).^(c1 -c2 ))’; ca = ca + 1; end end yTest(cnt) = y; end else fprintf(’This function is not implemented for dimensions > 2.\n’); yTest = zeros(size(xTest )); end function [y] = TPBasis(x,order ,nInputs)

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

147

Parameters = varargin {1}; end parameterNames = fieldnames (Parameters ); for c1 = 1: length( parameterNames) parameterName = parameterNames{c1}; parameterValue = Parameters.( parameterName ); switch lower( parameterName ) case lower(’Order ’),

  • rder

= parameterValue; case lower(’Penalty ’), penalty = parameterValue;

  • therwise

error ([’Unrecognized property: ’’’ varargin{c1} ’’’’]); end end end % ========================================================================== % Preprocessing % ========================================================================== nInputs = size(xBuild ,2); nTrain = size(xBuild ,1); nTest = size(xTest ,1); if 2* order≤nInputs error(’Polynomial degree must be greater than 0.5*d.\n’); end % ========================================================================== % Build Estimator % ========================================================================== if nInputs ==1, T = zeros(order ,nTrain );

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

145

% ============================================================= % TPBasis: Thin Plate Basis Function Output % % y = TPBasis(x,order ,d) returns the

  • utput of the

Thin Plate % spline basis function for input x, polynomial

  • f degree
  • rder ,

% and dimension d. % % See Green & Silverman , Page 160 for details. % ============================================================== if 2* order≤nInputs error(’Polynomial degree must be greater than 0.5*d.\n’); end; if rem(nInputs ,2)==0 % If nInputs even th = ( -1)^( order +1+ nInputs /2)*2^(1 -2* order )*pi^(- nInputs /2) ’; th = th*inv(factorial(order -1)* factorial(order - nInputs /2)); y = th*x. ^(2* order -nInputs ); y(x >0) = y(x >0).*log(x(x >0)); else th = gamma(nInputs /2- order )*2^( -2* order )*pi^(- nInputs /2)* inv(factorial(order -1)); y = th*x. ^(2* order -nInputs ); end end end

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

148

for c1 = 1: order T(c1 ,:) = (xBuild. ^(c1 -1)) ’; end E = zeros(nTrain ,nTrain ); for c1 = 1: nTrain E(c1 ,:) = TPBasis(abs(xBuild(c1)-xBuild),order ,nInputs )’; end A = [(E + penalty*eye(nTrain )) T’;T zeros(order ,order )]; b = [yBuild;zeros(order ,1)]; w = inv(A)*b; delt = w(1: nTrain ); a = w(nTrain + (1: order )); yTest = zeros(nTest ,1); p = (0: order -1) ’; for c1 = 1: nTest yTest(c1) = delt ’* TPBasis(abs(xTest(c1)-xBuild),order ,nInputs ); yTest(c1) = yTest(c1) + a’* xTest(c1).^p; end elseif nInputs ==2, x1 = xBuild (: ,1); x2 = xBuild (: ,2); M = nchoosek(order +1 ,2); T = zeros(M,nTrain ); cnt = 1; for c1 = 0:order -1, % Power Loop for c2 = 0:c1 , T(cnt ,:) = (x1. ^(c2).*x2. ^(c1 -c2))’; cnt = cnt + 1; end

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

146

slide-38
SLIDE 38

Thin Plate Splines Discussion

  • Has an optimality property
  • Few user-specified parameters
  • Difficult to apply in high dimensions (order must grow linearly

with dimension)

  • Works very well in low dimensions
  • Computation may be an issue, but can be reduced with clustering
  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

149

References

[1] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer-Verlag, New York, 2001. [2] Peter Lancaster and K¸ estutis ˇ

  • Salkauskas. Curve and Surface

Fitting: An Introduction. Academic Press Inc., 1986. [3] Simon Haykin. Neural Networks: A Comprehensive Foundation. Prentice-Hall, Inc., second edition, 1999.

  • J. McNames

Portland State University ECE 4/557 Nonlinear Modeling

  • Ver. 1.07

150