 
              Density Estimation Classification Regression Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1
Density Estimation Classification Regression Outline Density Estimation 1 Bins Kernel Estimators k-Nearest Neighbor Multivariate Data Classification 2 Regression 3 2
Density Estimation Classification Regression Nonparametric Methods When we cannot make assumptions about the distribution of the data 3
Density Estimation Classification Regression Nonparametric Methods When we cannot make assumptions about the distribution of the data But want to apply methods similar to the ones we have already learned 3
Density Estimation Classification Regression Nonparametric Methods When we cannot make assumptions about the distribution of the data But want to apply methods similar to the ones we have already learned Assumption: Similar inputs have similar outputs 3
Density Estimation Classification Regression Nonparametric Methods When we cannot make assumptions about the distribution of the data But want to apply methods similar to the ones we have already learned Assumption: Similar inputs have similar outputs Secondary assumption: Key function (e.g., pdf, discriminants) change smoothly 3
Density Estimation Classification Regression Density Estimation Given a training set X , can we estimate the sample distribution from the data itself? Trick will be coming up with useful summaries that do not require us to retain the entire training set after training. 4
Density Estimation Classification Regression Bins Divide data into bins of size h p ( x ) = # { x t in same bin as x } Histogram: ˆ Nh Naive Estimator: Solves problems of origin and exact placement of bin boundaries p ( x ) = # { x − h < x t ≤ x + h } ˆ 2 Nh This can be rewritten as N � x − x t � p ( x ) = 1 � ˆ w Nh h t =1 where w is a weight function � 1 / 2 if | u | < 1 w ( u ) = 0 otherwise 5
Density Estimation Classification Regression Histogram 6
Density Estimation Classification Regression Naive Estimator 7
Density Estimation Classification Regression Kernel Estimators We can generalize the idea of the weighting function to Kernel function, e.g., Gaussian kernel 1 e − u 2 / 2 √ K ( u ) = 2 π Kernel estimator (a.k.a. Parzen windows) N � x − x t p ( x ) = 1 � � ˆ K Nh h t =1 8
Density Estimation Classification Regression Kernels 9
Density Estimation Classification Regression k-Nearest Neighbor Estimator Instead of fixing the bin width h and counting the number of neighboring instances, fix the number of neighbors and compute the bin width k p ( x ) = ˆ 2 Nd k ( x ) where d k ( x ) is the distnace to the k th closest instance to x 10
Density Estimation Classification Regression k-Nearest 11
Density Estimation Classification Regression Multivariate Data Kernel estimator N x t 1 � � x − � � � p ( � ˆ x ) = K Nh d h t =1 Multivariate Gaussian kernel (spherical) d u || 2 1 � −|| � � K ( u ) = √ exp 2 2 π Multivariate Gaussian kernel (ellipsoid) 1 � − 1 � u T S − 1 / 2 � K ( u ) = (2 π ) d / 2 | S | 1 / 2 exp 2 � u 12
Density Estimation Classification Regression Potential Problems As number of dimensions rises, # of “bins” explodes Data must be similarly scaled if idea of “distance” is to remain reasonable 13
Density Estimation Classification Regression Classification Estimate p ( � x | C i ) and use Bayes’ rule 14
Density Estimation Classification Regression Classification - Kernel Estimator N x t 1 � � x − � � � r t p ( � ˆ x | C i ) = K i N i h d h t =1 P ( C i ) = N i ˆ N x | C i )ˆ g i ( � x ) = ˆ p ( � P ( C i ) N x t 1 � � x − � � � r t = K i Nh d h t =1 15
Density Estimation Classification Regression Classification - k-NN Estimator k i ˆ p ( � x | C i ) = N i V k ( � x ) where V k ( � x ) is the volume of the smallest (hyper)sphere containing � x and its nearest k neighbors P ( C i ) = N i ˆ N x | C i )ˆ ˆ p ( � P ( C i ) ˆ ( C i | � x ) = ˆ p ( � x ) k i = k Assign the input to the class having the most instances among the k nearest neighbors of � x . 16
Density Estimation Classification Regression Condensed Nearest Neighbor 1-NN is easy to compute Discriminant is piecewise linear But requires that we keep the entire training set Condensed NN discards “interior” points that cannot affect the discriminant Finding such consistent subsets is NP Requires approximation in practice 17
Density Estimation Classification Regression Nonparametric Regression (Smoothing) Extending the idea of a histogram to regression t b ( x , x t ) r t � ˆ g ( x ) = � t b ( x , x t ) where � 1 if x is in the same bin as x b ( x , x t ) = o otherwise “regressogram” 18
Density Estimation Classification Regression Bin Smoothing 19
Density Estimation Classification Regression Running Mean Smoother Define a “bin” centered on x : � � x − x t r t � t w h g ( x ) = ˆ � x − x t � � t w h where � 1 if | u | < 1 w ( u ) = o otherwise particularly popular with evenly spaced data 20
Density Estimation Classification Regression Running Mean Smoothing 21
Density Estimation Classification Regression Kernel Smoother � � x − x t r t � t K h g ( x ) = ˆ � x − x t � � t K h where K is Gaussian In this, and subsequent smoothers, can also reformulate in terms of closest k neighbors 22
Density Estimation Classification Regression Kernel Smoothing 23
Density Estimation Classification Regression Running Line Smoother In running mean we took an average over all points in a bin Instead we could fit a linear regression line to all points in a bin Numerical analysis has spline techniques that smooth derivatives as well as function values 24
Density Estimation Classification Regression Running Line Smoothing 25
Density Estimation Classification Regression Choosing h or k Small values exaggerate effects of single instances - high variance Larger values increases bias Cross-validation 26
Recommend
More recommend