nonlinear modeling overview problem definition problem
play

Nonlinear Modeling Overview Problem Definition Problem definition - PowerPoint PPT Presentation

Nonlinear Modeling Overview Problem Definition Problem definition Observed Process Variables Output Curse of dimensionality Observed Model x 1 ,...,x c y Variables Output x 1 ,...,x p Local Models Observed Unobserved y


  1. Nonlinear Modeling Overview Problem Definition • Problem definition Observed Process Variables Output • Curse of dimensionality Observed Model x 1 ,...,x c y Variables Output x 1 ,...,x p • Local Models Observed Unobserved y Variables Variables • Kernel Methods z 1 ,...,z q x c+1 ,...,x p • Weighted Euclidean distance • Radial basis functions • Nonlinear Modeling Problem: Given a data set with a single • Thin plate splines input vector x , find the best function ˆ g ( x ) that minimizes the prediction error on new input vectors (probably not in the data set) • Clustering Algorithms • This problem has many names – Nonlinear modeling problem – Nonparametric regression – Multivariate smoothing J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 1 J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 2 Working in Higher Dimensions Problems with Higher Dimensions • It would seem that we can simply generalize our methods for • Most smoothing methods are essentially are based on local univariate smoothing to higher dimensions weighting of points • Why would we ever use a linear model? • In higher dimensions it becomes difficult to identify “neighborhoods” – The smoothing methods impose fewer assumptions – Why not just allow any problem to be nonlinear? • Called the curse of dimensionality • Similarly, why would we ever exclude a possible explanatory • Shows up in many contexts (input) variable? – Isn’t more information always better? – If an input variable might help, why would we ever exclude it? – Increases dimensionality of our input space, but so what? • This discussion based on [1, Section 2.5] J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 3 J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 4

  2. What is “Local”? Extrapolation versus Interpolation in High Dimensions • Suppose we have p inputs uniformly distributed in a unit • Suppose our inputs are uniformly distributed within a unit hypercube hyper-sphere centered at the origin • Let us use a smaller hypercube only a fraction k/n of the points • The median distance from the origin to the nearest point is given to build our local model by 1 − 0 . 5 1 /n � 1 /p • An unusual neighborhood, but suitable for the point � d ( p, n ) = • What is the edge length of our hypercube neighborhood? • d (5000 , 10) ≈ 0 . 52 , which is more than half way to the boundary – The volume of our neighborhood is r � k/n = e p where e is the edge length • Thus, most of the data points are closer to the boundary than any – Thus e p ( r ) = ( r ) 1 /p other data point – If we wish to use a neighborhood that captures 1% of the • This means we are always trying to estimate near the edges volume/points, and p = 10 , then e 10 (0 . 01) = 0 . 63 ! • In higher dimensions, we are effectively attempting to extrapolate – Similarly e 10 (0 . 10) = 0 . 80 rather than interpolate or smooth between the data points – The entire range of each input is only 1.0 • How can such neighborhoods be considered “local”? J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 5 J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 6 Euclidean Distance in High Dimensions Sampling Density in High Dimensions • If the inputs are drawn from an i.i.d. distribution, the Euclidean • Suppose we want a uniform sampling density in one dimension distance can be viewed as a scaled estimate of the average consisting of n = 100 points along a grid distance along a coordinate, δ 2 j = ( x j − x i,j ) 2 • In two dimensions we would need n = 100 2 points to have the same sampling density (spacing between neighboring points along p δ 2 = 1 i = 1 ( x j − x i,j ) 2 ≈ E[( x j − x i,j ) 2 ] ˆ � pd 2 the axes) p • In p dimensions we would need n = 100 p points! j =1 • If p = 10 we would need n = 100 10 , which is impractical σ 2 δ 2 = var[ δ 2 ] = p − 1 σ 2 δ 2 � µ δ 2 µ ˆ ˆ δ 2 • Thus, in high dimensions all data sets sparsely sample the input σ 2 d 2 = var[ pδ 2 ] = pσ 2 µ d 2 = pµ δ 2 space δ 2 • Thus the coefficient of variation is γ � σ d 2 1 σ δ 2 µ d 2 = √ p µ δ 2 • All neighbors become more equidistant as p increases! J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 7 J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 8

  3. Coping with The Curse Imposing Structure • If the complexity of the problem grows with dimensionality (e.g., • Another possible mechanism to cope with the curse is to use PCA g ( x ) = e −|| x || 2 , a simple bump in p dimensions), we must have a or similar techniques dense sample ( n ∝ n p 1 ) to estimate g ( x ) with the same accuracy – Appropriate when the inputs are correlated over all values of p – In other words, when the inputs fall close to a lower dimensional hyperplane in the p dimensional space • In practice we don’t have this luxury • May also be appropriate to consider weighted distances • This also suggests that adding inputs (increasing p ) makes matters much worse, a paradoxical result – Idea is that a few inputs may dominate the distance measure • The way out of the curse is to impose structure on g ( x ) • There are many other ideas for imposing structure – For example a linear model may be reasonable • Is the key difference between different nonlinear modeling strategies y = w T x + ε – Now we only need n ≈ 10 p , dramatically fewer points to build our model – This model doesn’t suffer from the curse J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 9 J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 10 Nonlinear Modeling The RampHill Data Set • In low dimensional spaces local models continue to work well • Like the Motorcycle data set for the univariate smoothing problem, I have a favorite data set for the nonlinear modeling problem – Also applies in high-dimensional spaces that can be reduced to lower dimensions • The RampHill data set [2, p. 150] • The following slides focus on “local” models as applied to a two • It is a synthetic data set dimensional problem • It has a number of nice properties for testing nonlinear models – Two flat regions in which the process is constant – A local bump (function of two input variables) – A global ramp that is locally linear – Two sharp edges (most models are smoother than this) • It only has two inputs so that we can plot the output (surface) • The inputs were drawn from a uniform distribution • No noise J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 11 J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 12

  4. Example 1: Ramp Hill Example 1: Ramp Hill • Number of points per (training/estimation) data set: 250 1.5 • Number of evaluation points (test/evaluation) data set: 2500 1 • Noise power: 0.07 • Signal ( g ( x ) ) power: 0.70 0.5 • Signal-to-noise ratio (SNR): 10.01 x 2 • Number of data sets: 250 0 −0.5 −1 −1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x 1 J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 13 J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 14 Example 1: Ramp Hill Surface Plot Example 1: MATLAB Code function [] = ShowRampHill () 1 1 % ============================================================================== % Author -Specified Parameters % ============================================================================== 0.5 0.5 nPointsBuild = 250; nDataSetsBuild = 250; g ( x 1 , x 2 ) nPointsSide = 50; g ( x 1 , x 2 ) 0 noisePower = 0.07; 0 % ============================================================================== −0.5 % Preprocessing −0.5 % ============================================================================== x1Side = linspace (-1.5 ,1.5 , nPointsSide ); −1 x2Side = linspace (-1.5 ,1.5 , nPointsSide ); 1 −1 1 −1 [x1Block , x2Block] = meshgrid(x1Side ,x2Side ); 0 0 0 1.5 1 −1 1 −1 x1Test = reshape(x1Block , nPointsSide ^2 ,1); 0.5 0 −0.5 x 2 −1 −1.5 x 1 x2Test = reshape(x2Block , nPointsSide ^2 ,1); x 1 x 2 xTest = [x1Test x2Test ]; functionName = mfilename; fileIdentifier = fopen ([ functionName ’.tex ’],’w’); % ============================================================================== % Create the Data Sets % ============================================================================== DataSetsBuild = repmat(struct(’x’ ,[],’y’ ,[]), nDataSetsBuild ,1); J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 15 J. McNames Portland State University ECE 4/557 Nonlinear Modeling Ver. 1.07 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend