Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. - - PowerPoint PPT Presentation

non parametric methods
SMART_READER_LITE
LIVE PREVIEW

Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. - - PowerPoint PPT Presentation

Kernel Density Estimation Nearest-neighbour Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5 Kernel Density Estimation Nearest-neighbour Outline Kernel Density Estimation Nearest-neighbour These are non-parametric


slide-1
SLIDE 1

Kernel Density Estimation Nearest-neighbour

Non-parametric Methods

Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5

slide-2
SLIDE 2

Kernel Density Estimation Nearest-neighbour

Outline

Kernel Density Estimation Nearest-neighbour

  • These are non-parametric methods
  • Rather than having a fixed set of parameters (e.g. weight

vector for regression, µ, Σ for Gaussian) we have a possibly infinite set of parameters based on each data point

  • Fundamental Distinction in Machine Learning:
  • Model-Based, Parametric. What’s the rule, law, pattern?
  • Instance-Based, non-parametric. What have I seen before

that’s similar?

slide-3
SLIDE 3

Kernel Density Estimation Nearest-neighbour

Histograms

  • Consider the problem of modelling the distribution of

brightness values in pictures taken on sunny days versus cloudy days

  • We could build histograms of pixel values for each class
slide-4
SLIDE 4

Kernel Density Estimation Nearest-neighbour

Histograms

✂✁☎✄✝✆ ✄✟✞

0.5 1 5

✂✁☎✄✝✆ ✄✟✠

0.5 1 5

✂✁☎✄✝✆ ✡✟☛

0.5 1 5

  • E.g. for sunny days
  • Count ni number of datapoints (pixels)

with brightness value falling into each bin: pi =

ni N∆i

  • Sensitive to bin width ∆i
  • Discontinuous due to bin edges
  • In D-dim space with M bins per

dimension, MD bins

slide-5
SLIDE 5

Kernel Density Estimation Nearest-neighbour

Histograms

✂✁☎✄✝✆ ✄✟✞

0.5 1 5

✂✁☎✄✝✆ ✄✟✠

0.5 1 5

✂✁☎✄✝✆ ✡✟☛

0.5 1 5

  • E.g. for sunny days
  • Count ni number of datapoints (pixels)

with brightness value falling into each bin: pi =

ni N∆i

  • Sensitive to bin width ∆i
  • Discontinuous due to bin edges
  • In D-dim space with M bins per

dimension, MD bins

slide-6
SLIDE 6

Kernel Density Estimation Nearest-neighbour

Histograms

✂✁☎✄✝✆ ✄✟✞

0.5 1 5

✂✁☎✄✝✆ ✄✟✠

0.5 1 5

✂✁☎✄✝✆ ✡✟☛

0.5 1 5

  • E.g. for sunny days
  • Count ni number of datapoints (pixels)

with brightness value falling into each bin: pi =

ni N∆i

  • Sensitive to bin width ∆i
  • Discontinuous due to bin edges
  • In D-dim space with M bins per

dimension, MD bins

slide-7
SLIDE 7

Kernel Density Estimation Nearest-neighbour

Histograms

✂✁☎✄✝✆ ✄✟✞

0.5 1 5

✂✁☎✄✝✆ ✄✟✠

0.5 1 5

✂✁☎✄✝✆ ✡✟☛

0.5 1 5

  • E.g. for sunny days
  • Count ni number of datapoints (pixels)

with brightness value falling into each bin: pi =

ni N∆i

  • Sensitive to bin width ∆i
  • Discontinuous due to bin edges
  • In D-dim space with M bins per

dimension, MD bins

slide-8
SLIDE 8

Kernel Density Estimation Nearest-neighbour

Local Density Estimation

  • In a histogram we use nearby points to estimate density.
  • For a small region around x, estimate density as:

p(x) = K NV

  • K is number of points in region, V is volume of region, N is

total number of datapoints

  • Basic Principle: high probability of x ⇐

⇒ x is close to many points.

slide-9
SLIDE 9

Kernel Density Estimation Nearest-neighbour

Kernel Density Estimation

  • Try to keep idea of using nearby points to estimate density,

but obtain smoother estimate

  • Estimate density by placing a small bump at each

datapoint

  • Kernel function k(·) determines shape of these bumps
  • Density estimate is

p(x) ∝ 1 N

N

  • n=1

k x − xn h

slide-10
SLIDE 10

Kernel Density Estimation Nearest-neighbour

Kernel Density Estimation

✂✁☎✄✝✆ ✄✞✄✞✟

0.5 1 5

✂✁☎✄✝✆ ✄✞✠

0.5 1 5

✂✁☎✄✝✆ ✡

0.5 1 5

  • Example using Gaussian kernel:

p(x) = 1 N

N

  • n=1

1 (2πh2)1/2 exp

  • −||x − xn||2

2h2

slide-11
SLIDE 11

Kernel Density Estimation Nearest-neighbour

Kernel Density Estimation

3 2 1 1 2 3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

  • Other kernels: Rectangle, Triangle, Epanechnikov
slide-12
SLIDE 12

Kernel Density Estimation Nearest-neighbour

Kernel Density Estimation

3 2 1 1 2 3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

5 5 10 15 20 25 30 0.02 0.04 0.06 0.08 0.1 0.12 0.14

  • Other kernels: Rectangle, Triangle, Epanechnikov
slide-13
SLIDE 13

Kernel Density Estimation Nearest-neighbour

Kernel Density Estimation

3 2 1 1 2 3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

5 5 10 15 20 25 30 0.02 0.04 0.06 0.08 0.1 0.12 0.14

  • Other kernels: Rectangle, Triangle, Epanechnikov
  • Fast at training time, slow at test time – keep all datapoints
slide-14
SLIDE 14

Kernel Density Estimation Nearest-neighbour

Kernel Density Estimation

3 2 1 1 2 3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

5 5 10 15 20 25 30 0.02 0.04 0.06 0.08 0.1 0.12 0.14

  • Other kernels: Rectangle, Triangle, Epanechnikov
  • Fast at training time, slow at test time – keep all datapoints
  • Sensitive to kernel bandwidth h
slide-15
SLIDE 15

Kernel Density Estimation Nearest-neighbour

Nearest-neighbour

✂✁☎✄

0.5 1 5

✂✁☎✆

0.5 1 5

✂✁☎✝✟✞

0.5 1 5

  • Instead of relying on kernel bandwidth to get proper

density estimate, fix number of nearby points K: p(x) = K NV

  • Note: diverges, not proper density estimate
slide-16
SLIDE 16

Kernel Density Estimation Nearest-neighbour

Nearest-neighbour for Classification

  • K Nearest neighbour is often used for classification
  • Classification: predict labels ti from xi
slide-17
SLIDE 17

Kernel Density Estimation Nearest-neighbour

Nearest-neighbour for Classification

x1 x2 (a)

  • K Nearest neighbour is often used for classification
  • Classification: predict labels ti from xi
  • e.g. xi ∈ R2 and ti ∈ {0, 1}, 3-nearest neighbour
slide-18
SLIDE 18

Kernel Density Estimation Nearest-neighbour

Nearest-neighbour for Classification

x1 x2 (a) x1 x2 (b)

  • K Nearest neighbour is often used for classification
  • Classification: predict labels ti from xi
  • e.g. xi ∈ R2 and ti ∈ {0, 1}, 3-nearest neighbour
  • K = 1 referred to as nearest-neighbour
slide-19
SLIDE 19

Kernel Density Estimation Nearest-neighbour

Nearest-neighbour for Classification

  • Good baseline method
  • Slow, but can use fancy datastructures for efficiency

(KD-trees, Locality Sensitive Hashing)

  • Nice theoretical properties
  • As we obtain more training data points, space becomes

more filled with labelled data

  • As N → ∞ error no more than twice Bayes error
slide-20
SLIDE 20

Kernel Density Estimation Nearest-neighbour

Conclusion

  • Readings: Ch. 2.5
  • Kernel density estimation
  • Model density p(x) using kernels around training datapoints
  • Nearest neighbour
  • Model density or perform classification using nearest

training datapoints

  • Multivariate Gaussian
  • Needed for next week’s lectures, if you need a refresher

read pp. 78-81