non parametric methods
play

Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. - PowerPoint PPT Presentation

Kernel Density Estimation Nearest-neighbour Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5 Kernel Density Estimation Nearest-neighbour Outline Kernel Density Estimation Nearest-neighbour These are non-parametric


  1. Kernel Density Estimation Nearest-neighbour Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5

  2. Kernel Density Estimation Nearest-neighbour Outline Kernel Density Estimation Nearest-neighbour • These are non-parametric methods • Rather than having a fixed set of parameters (e.g. weight vector for regression, µ, Σ for Gaussian) we have a possibly infinite set of parameters based on each data point • Fundamental Distinction in Machine Learning: • Model-Based, Parametric. What’s the rule, law, pattern? • Instance-Based, non-parametric. What have I seen before that’s similar?

  3. Kernel Density Estimation Nearest-neighbour Histograms • Consider the problem of modelling the distribution of brightness values in pictures taken on sunny days versus cloudy days • We could build histograms of pixel values for each class

  4. Kernel Density Estimation Nearest-neighbour Histograms • E.g. for sunny days 5 �✂✁☎✄✝✆ ✄✟✞ • Count n i number of datapoints (pixels) 0 0 0.5 1 with brightness value falling into each 5 �✂✁☎✄✝✆ ✄✟✠ n i bin: p i = N ∆ i 0 • Sensitive to bin width ∆ i 0 0.5 1 5 �✂✁☎✄✝✆ ✡✟☛ • Discontinuous due to bin edges 0 • In D -dim space with M bins per 0 0.5 1 dimension, M D bins

  5. Kernel Density Estimation Nearest-neighbour Histograms • E.g. for sunny days 5 �✂✁☎✄✝✆ ✄✟✞ • Count n i number of datapoints (pixels) 0 0 0.5 1 with brightness value falling into each 5 �✂✁☎✄✝✆ ✄✟✠ n i bin: p i = N ∆ i 0 • Sensitive to bin width ∆ i 0 0.5 1 5 �✂✁☎✄✝✆ ✡✟☛ • Discontinuous due to bin edges 0 • In D -dim space with M bins per 0 0.5 1 dimension, M D bins

  6. Kernel Density Estimation Nearest-neighbour Histograms • E.g. for sunny days 5 �✂✁☎✄✝✆ ✄✟✞ • Count n i number of datapoints (pixels) 0 0 0.5 1 with brightness value falling into each 5 �✂✁☎✄✝✆ ✄✟✠ n i bin: p i = N ∆ i 0 • Sensitive to bin width ∆ i 0 0.5 1 5 �✂✁☎✄✝✆ ✡✟☛ • Discontinuous due to bin edges 0 • In D -dim space with M bins per 0 0.5 1 dimension, M D bins

  7. Kernel Density Estimation Nearest-neighbour Histograms • E.g. for sunny days 5 �✂✁☎✄✝✆ ✄✟✞ • Count n i number of datapoints (pixels) 0 0 0.5 1 with brightness value falling into each 5 �✂✁☎✄✝✆ ✄✟✠ n i bin: p i = N ∆ i 0 • Sensitive to bin width ∆ i 0 0.5 1 5 �✂✁☎✄✝✆ ✡✟☛ • Discontinuous due to bin edges 0 • In D -dim space with M bins per 0 0.5 1 dimension, M D bins

  8. Kernel Density Estimation Nearest-neighbour Local Density Estimation • In a histogram we use nearby points to estimate density. • For a small region around x , estimate density as: p ( x ) = K NV • K is number of points in region, V is volume of region, N is total number of datapoints • Basic Principle: high probability of x ⇐ ⇒ x is close to many points.

  9. Kernel Density Estimation Nearest-neighbour Kernel Density Estimation • Try to keep idea of using nearby points to estimate density, but obtain smoother estimate • Estimate density by placing a small bump at each datapoint • Kernel function k ( · ) determines shape of these bumps • Density estimate is N � x − x n � p ( x ) ∝ 1 � k N h n = 1

  10. ✡ Kernel Density Estimation Nearest-neighbour Kernel Density Estimation 5 �✂✁☎✄✝✆ ✄✞✄✞✟ 0 0 0.5 1 5 �✂✁☎✄✝✆ ✄✞✠ 0 0 0.5 1 5 �✂✁☎✄✝✆ 0 0 0.5 1 • Example using Gaussian kernel: N −|| x − x n || 2 � � p ( x ) = 1 1 � ( 2 π h 2 ) 1 / 2 exp 2 h 2 N n = 1

  11. Kernel Density Estimation Nearest-neighbour Kernel Density Estimation 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 � 3 � 2 � 1 0 1 2 3 • Other kernels: Rectangle , Triangle, Epanechnikov

  12. Kernel Density Estimation Nearest-neighbour Kernel Density Estimation 0.14 0.12 0.1 1 0.9 0.08 0.8 0.7 0.06 0.6 0.5 0.04 0.4 0.3 0.02 0.2 0.1 0 0 � 3 � 2 � 1 0 1 2 3 � 5 0 5 10 15 20 25 30 • Other kernels: Rectangle , Triangle, Epanechnikov

  13. Kernel Density Estimation Nearest-neighbour Kernel Density Estimation 0.14 0.12 0.1 1 0.9 0.08 0.8 0.7 0.06 0.6 0.5 0.04 0.4 0.3 0.02 0.2 0.1 0 0 � 3 � 2 � 1 0 1 2 3 � 5 0 5 10 15 20 25 30 • Other kernels: Rectangle , Triangle, Epanechnikov • Fast at training time, slow at test time – keep all datapoints

  14. Kernel Density Estimation Nearest-neighbour Kernel Density Estimation 0.14 0.12 0.1 1 0.9 0.08 0.8 0.7 0.06 0.6 0.5 0.04 0.4 0.3 0.02 0.2 0.1 0 0 � 3 � 2 � 1 0 1 2 3 � 5 0 5 10 15 20 25 30 • Other kernels: Rectangle , Triangle, Epanechnikov • Fast at training time, slow at test time – keep all datapoints • Sensitive to kernel bandwidth h

  15. Kernel Density Estimation Nearest-neighbour Nearest-neighbour 5 �✂✁☎✄ 0 0 0.5 1 5 �✂✁☎✆ 0 0 0.5 1 5 �✂✁☎✝✟✞ 0 0 0.5 1 • Instead of relying on kernel bandwidth to get proper density estimate, fix number of nearby points K : p ( x ) = K NV • Note: diverges, not proper density estimate

  16. Kernel Density Estimation Nearest-neighbour Nearest-neighbour for Classification • K Nearest neighbour is often used for classification • Classification: predict labels t i from x i

  17. Kernel Density Estimation Nearest-neighbour Nearest-neighbour for Classification x 2 x 1 (a) • K Nearest neighbour is often used for classification • Classification: predict labels t i from x i • e.g. x i ∈ R 2 and t i ∈ { 0 , 1 } , 3-nearest neighbour

  18. Kernel Density Estimation Nearest-neighbour Nearest-neighbour for Classification x 2 x 2 x 1 x 1 (a) (b) • K Nearest neighbour is often used for classification • Classification: predict labels t i from x i • e.g. x i ∈ R 2 and t i ∈ { 0 , 1 } , 3-nearest neighbour • K = 1 referred to as nearest-neighbour

  19. Kernel Density Estimation Nearest-neighbour Nearest-neighbour for Classification • Good baseline method • Slow, but can use fancy datastructures for efficiency (KD-trees, Locality Sensitive Hashing) • Nice theoretical properties • As we obtain more training data points, space becomes more filled with labelled data • As N → ∞ error no more than twice Bayes error

  20. Kernel Density Estimation Nearest-neighbour Conclusion • Readings: Ch. 2.5 • Kernel density estimation • Model density p ( x ) using kernels around training datapoints • Nearest neighbour • Model density or perform classification using nearest training datapoints • Multivariate Gaussian • Needed for next week’s lectures, if you need a refresher read pp. 78-81

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend