Outline Density Estimation 1 Nonparametric Methods Bins Kernel - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Density Estimation 1 Nonparametric Methods Bins Kernel - - PowerPoint PPT Presentation

Density Estimation Classification Regression Density Estimation Classification Regression Outline Density Estimation 1 Nonparametric Methods Bins Kernel Estimators k-Nearest Neighbor Steven J Zeil Multivariate Data Old Dominion Univ.


slide-1
SLIDE 1

Density Estimation Classification Regression

Nonparametric Methods

Steven J Zeil

Old Dominion Univ.

Fall 2010

1 Density Estimation Classification Regression

Outline

1

Density Estimation Bins Kernel Estimators k-Nearest Neighbor Multivariate Data

2

Classification

3

Regression

2 Density Estimation Classification Regression

Nonparametric Methods

When we cannot make assumptions about the distribution of the data But want to apply methods similar to the ones we have already learned Assumption: Similar inputs have similar outputs

Secondary assumption: Key function (e.g., pdf, discriminants) change smoothly

3 Density Estimation Classification Regression

Density Estimation

Given a training set X, can we estimate the sample distribution from the data itself? Trick will be coming up with useful summaries that do not require us to retain the entire training set after training.

4

slide-2
SLIDE 2

Density Estimation Classification Regression

Bins

Divide data into bins of size h Histogram: ˆ p(x) = #{xtin same bin asx}

Nh

Naive Estimator: Solves problems of origin and exact placement of bin boundaries ˆ p(x) = #{x − h < xt ≤ x + h} 2Nh This can be rewritten as ˆ p(x) = 1 Nh

N

  • t=1

w x − xt h

  • where w is a weight function

w(u) = 1/2 if |u| < 1

  • therwise

5 Density Estimation Classification Regression

Histogram

6 Density Estimation Classification Regression

Naive Estimator

7 Density Estimation Classification Regression

Kernel Estimators

We can generalize the idea of the weighting function to Kernel function, e.g., Gaussian kernel K(u) = 1 √ 2π e−u2/2 Kernel estimator (a.k.a. Parzen windows) ˆ p(x) = 1 Nh

N

  • t=1

K x − xt h

  • 8
slide-3
SLIDE 3

Density Estimation Classification Regression

Kernels

9 Density Estimation Classification Regression

k-Nearest Neighbor Estimator

Instead of fixing the bin width h and counting the number of neighboring instances, fix the number of neighbors and compute the bin width ˆ p(x) = k 2Ndk(x) where dk(x) is the distnace to the kth closest instance to x

10 Density Estimation Classification Regression

k-Nearest

11 Density Estimation Classification Regression

Multivariate Data

Kernel estimator ˆ p( x) = 1 Nhd

N

  • t=1

K

  • x −

xt h

  • Multivariate Gaussian kernel (spherical)

K(u) = 1 √ 2π

d

exp

  • −||

u||2 2

  • Multivariate Gaussian kernel (ellipsoid)

K(u) = 1 (2π)d/2|S|1/2 exp

  • −1

2 uTS−1/2 u

  • 12
slide-4
SLIDE 4

Density Estimation Classification Regression

Potential Problems

As number of dimensions rises, # of “bins” explodes Data must be similarly scaled if idea of “distance” is to remain reasonable

13 Density Estimation Classification Regression

Classification

Estimate p( x|Ci) and use Bayes’ rule

14 Density Estimation Classification Regression

Classification - Kernel Estimator

ˆ p( x|Ci) = 1 Nihd

N

  • t=1

K

  • x −

xt h

  • rt

i

ˆ P(Ci) = Ni N gi( x) = ˆ p( x|Ci)ˆ P(Ci) = 1 Nhd

N

  • t=1

K

  • x −

xt h

  • rt

i

15 Density Estimation Classification Regression

Classification - k-NN Estimator

ˆ p( x|Ci) = ki NiV k( x) where V k( x) is the volume of the smallest (hyper)sphere containing x and its nearest k neighbors ˆ P(Ci) = Ni N ˆ (Ci| x) = ˆ p( x|Ci)ˆ P(Ci) ˆ p( x) = ki k Assign the input to the class having the most instances among the k nearest neighbors of x.

16

slide-5
SLIDE 5

Density Estimation Classification Regression

Condensed Nearest Neighbor

1-NN is easy to compute

Discriminant is piecewise linear But requires that we keep the entire training set

Condensed NN discards “interior” points that cannot affect the discriminant Finding such consistent subsets is NP

Requires approximation in practice

17 Density Estimation Classification Regression

Nonparametric Regression (Smoothing)

Extending the idea of a histogram to regression ˆ g(x) =

  • t b(x, xt)rt
  • t b(x, xt)

where b(x, xt) = 1 if x is in the same bin as x

  • therwise

“regressogram”

18 Density Estimation Classification Regression

Bin Smoothing

19 Density Estimation Classification Regression

Running Mean Smoother

Define a “bin” centered on x: ˆ g(x) =

  • t w
  • x−xt

h

  • rt
  • t w

x−xt

h

  • where

w(u) = 1 if |u| < 1

  • therwise

particularly popular with evenly spaced data

20

slide-6
SLIDE 6

Density Estimation Classification Regression

Running Mean Smoothing

21 Density Estimation Classification Regression

Kernel Smoother

ˆ g(x) =

  • t K
  • x−xt

h

  • rt
  • t K

x−xt

h

  • where K is Gaussian

In this, and subsequent smoothers, can also reformulate in terms of closest k neighbors

22 Density Estimation Classification Regression

Kernel Smoothing

23 Density Estimation Classification Regression

Running Line Smoother

In running mean we took an average over all points in a bin Instead we could fit a linear regression line to all points in a bin Numerical analysis has spline techniques that smooth derivatives as well as function values

24

slide-7
SLIDE 7

Density Estimation Classification Regression

Running Line Smoothing

25 Density Estimation Classification Regression

Choosing h or k

Small values exaggerate effects of single instances - high variance Larger values increases bias Cross-validation

26