Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 - - PowerPoint PPT Presentation

nonparametric methods
SMART_READER_LITE
LIVE PREVIEW

Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 - - PowerPoint PPT Presentation

Density Estimation Classification Regression Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Density Estimation Classification Regression Outline Density Estimation 1 Bins Kernel Estimators k-Nearest Neighbor


slide-1
SLIDE 1

Density Estimation Classification Regression

Nonparametric Methods

Steven J Zeil

Old Dominion Univ.

Fall 2010

1

slide-2
SLIDE 2

Density Estimation Classification Regression

Outline

1

Density Estimation Bins Kernel Estimators k-Nearest Neighbor Multivariate Data

2

Classification

3

Regression

2

slide-3
SLIDE 3

Density Estimation Classification Regression

Nonparametric Methods

When we cannot make assumptions about the distribution of the data

3

slide-4
SLIDE 4

Density Estimation Classification Regression

Nonparametric Methods

When we cannot make assumptions about the distribution of the data But want to apply methods similar to the ones we have already learned

3

slide-5
SLIDE 5

Density Estimation Classification Regression

Nonparametric Methods

When we cannot make assumptions about the distribution of the data But want to apply methods similar to the ones we have already learned Assumption: Similar inputs have similar outputs

3

slide-6
SLIDE 6

Density Estimation Classification Regression

Nonparametric Methods

When we cannot make assumptions about the distribution of the data But want to apply methods similar to the ones we have already learned Assumption: Similar inputs have similar outputs

Secondary assumption: Key function (e.g., pdf, discriminants) change smoothly

3

slide-7
SLIDE 7

Density Estimation Classification Regression

Density Estimation

Given a training set X, can we estimate the sample distribution from the data itself? Trick will be coming up with useful summaries that do not require us to retain the entire training set after training.

4

slide-8
SLIDE 8

Density Estimation Classification Regression

Bins

Divide data into bins of size h Histogram: ˆ p(x) = #{xtin same bin asx}

Nh

Naive Estimator: Solves problems of origin and exact placement of bin boundaries ˆ p(x) = #{x − h < xt ≤ x + h} 2Nh This can be rewritten as ˆ p(x) = 1 Nh

N

  • t=1

w x − xt h

  • where w is a weight function

w(u) = 1/2 if |u| < 1

  • therwise

5

slide-9
SLIDE 9

Density Estimation Classification Regression

Histogram

6

slide-10
SLIDE 10

Density Estimation Classification Regression

Naive Estimator

7

slide-11
SLIDE 11

Density Estimation Classification Regression

Kernel Estimators

We can generalize the idea of the weighting function to Kernel function, e.g., Gaussian kernel K(u) = 1 √ 2π e−u2/2 Kernel estimator (a.k.a. Parzen windows) ˆ p(x) = 1 Nh

N

  • t=1

K x − xt h

  • 8
slide-12
SLIDE 12

Density Estimation Classification Regression

Kernels

9

slide-13
SLIDE 13

Density Estimation Classification Regression

k-Nearest Neighbor Estimator

Instead of fixing the bin width h and counting the number of neighboring instances, fix the number of neighbors and compute the bin width ˆ p(x) = k 2Ndk(x) where dk(x) is the distnace to the kth closest instance to x

10

slide-14
SLIDE 14

Density Estimation Classification Regression

k-Nearest

11

slide-15
SLIDE 15

Density Estimation Classification Regression

Multivariate Data

Kernel estimator ˆ p( x) = 1 Nhd

N

  • t=1

K

  • x −

xt h

  • Multivariate Gaussian kernel (spherical)

K(u) = 1 √ 2π

d

exp

  • −||

u||2 2

  • Multivariate Gaussian kernel (ellipsoid)

K(u) = 1 (2π)d/2|S|1/2 exp

  • −1

2 uTS−1/2 u

  • 12
slide-16
SLIDE 16

Density Estimation Classification Regression

Potential Problems

As number of dimensions rises, # of “bins” explodes Data must be similarly scaled if idea of “distance” is to remain reasonable

13

slide-17
SLIDE 17

Density Estimation Classification Regression

Classification

Estimate p( x|Ci) and use Bayes’ rule

14

slide-18
SLIDE 18

Density Estimation Classification Regression

Classification - Kernel Estimator

ˆ p( x|Ci) = 1 Nihd

N

  • t=1

K

  • x −

xt h

  • rt

i

ˆ P(Ci) = Ni N gi( x) = ˆ p( x|Ci)ˆ P(Ci) = 1 Nhd

N

  • t=1

K

  • x −

xt h

  • rt

i

15

slide-19
SLIDE 19

Density Estimation Classification Regression

Classification - k-NN Estimator

ˆ p( x|Ci) = ki NiV k( x) where V k( x) is the volume of the smallest (hyper)sphere containing x and its nearest k neighbors ˆ P(Ci) = Ni N ˆ (Ci| x) = ˆ p( x|Ci)ˆ P(Ci) ˆ p( x) = ki k Assign the input to the class having the most instances among the k nearest neighbors of x.

16

slide-20
SLIDE 20

Density Estimation Classification Regression

Condensed Nearest Neighbor

1-NN is easy to compute

Discriminant is piecewise linear But requires that we keep the entire training set

Condensed NN discards “interior” points that cannot affect the discriminant Finding such consistent subsets is NP

Requires approximation in practice

17

slide-21
SLIDE 21

Density Estimation Classification Regression

Nonparametric Regression (Smoothing)

Extending the idea of a histogram to regression ˆ g(x) =

  • t b(x, xt)rt
  • t b(x, xt)

where b(x, xt) = 1 if x is in the same bin as x

  • therwise

“regressogram”

18

slide-22
SLIDE 22

Density Estimation Classification Regression

Bin Smoothing

19

slide-23
SLIDE 23

Density Estimation Classification Regression

Running Mean Smoother

Define a “bin” centered on x: ˆ g(x) =

  • t w
  • x−xt

h

  • rt
  • t w

x−xt

h

  • where

w(u) = 1 if |u| < 1

  • therwise

particularly popular with evenly spaced data

20

slide-24
SLIDE 24

Density Estimation Classification Regression

Running Mean Smoothing

21

slide-25
SLIDE 25

Density Estimation Classification Regression

Kernel Smoother

ˆ g(x) =

  • t K
  • x−xt

h

  • rt
  • t K

x−xt

h

  • where K is Gaussian

In this, and subsequent smoothers, can also reformulate in terms of closest k neighbors

22

slide-26
SLIDE 26

Density Estimation Classification Regression

Kernel Smoothing

23

slide-27
SLIDE 27

Density Estimation Classification Regression

Running Line Smoother

In running mean we took an average over all points in a bin Instead we could fit a linear regression line to all points in a bin Numerical analysis has spline techniques that smooth derivatives as well as function values

24

slide-28
SLIDE 28

Density Estimation Classification Regression

Running Line Smoothing

25

slide-29
SLIDE 29

Density Estimation Classification Regression

Choosing h or k

Small values exaggerate effects of single instances - high variance Larger values increases bias Cross-validation

26