Outline Density Estimation 1 Nonparametric Methods Bins Kernel - PowerPoint PPT Presentation

Density Estimation Classification Regression Density Estimation Classification Regression Outline Density Estimation 1 Nonparametric Methods Bins Kernel Estimators k-Nearest Neighbor Steven J Zeil Multivariate Data Old Dominion Univ. Classification 2 Fall 2010 Regression 3 1 2 Density Estimation Classification Regression Density Estimation Classification Regression Nonparametric Methods Density Estimation When we cannot make assumptions about the distribution of Given a training set X , can we estimate the sample the data distribution from the data itself? But want to apply methods similar to the ones we have Trick will be coming up with useful summaries that do not already learned require us to retain the entire training set after training. Assumption: Similar inputs have similar outputs Secondary assumption: Key function (e.g., pdf, discriminants) change smoothly 3 4

Density Estimation Classification Regression Density Estimation Classification Regression Bins Histogram Divide data into bins of size h p ( x ) = # { x t in same bin as x } Histogram: ˆ Nh Naive Estimator: Solves problems of origin and exact placement of bin boundaries p ( x ) = # { x − h < x t ≤ x + h } ˆ 2 Nh This can be rewritten as N � x − x t p ( x ) = 1 � � ˆ w Nh h t =1 where w is a weight function � 1 / 2 if | u | < 1 w ( u ) = 0 otherwise 5 6 Density Estimation Classification Regression Density Estimation Classification Regression Naive Estimator Kernel Estimators We can generalize the idea of the weighting function to Kernel function, e.g., Gaussian kernel 1 e − u 2 / 2 K ( u ) = √ 2 π Kernel estimator (a.k.a. Parzen windows) N � x − x t � p ( x ) = 1 � ˆ K Nh h t =1 7 8

Density Estimation Classification Regression Density Estimation Classification Regression Kernels k-Nearest Neighbor Estimator Instead of fixing the bin width h and counting the number of neighboring instances, fix the number of neighbors and compute the bin width k p ( x ) = ˆ 2 Nd k ( x ) where d k ( x ) is the distnace to the k th closest instance to x 9 10 Density Estimation Classification Regression Density Estimation Classification Regression k-Nearest Multivariate Data Kernel estimator N x t 1 � � x − � � � p ( � ˆ x ) = K Nh d h t =1 Multivariate Gaussian kernel (spherical) d u || 2 1 � −|| � � √ K ( u ) = exp 2 2 π Multivariate Gaussian kernel (ellipsoid) 1 � − 1 � u T S − 1 / 2 � K ( u ) = (2 π ) d / 2 | S | 1 / 2 exp 2 � u 11 12

Density Estimation Classification Regression Density Estimation Classification Regression Potential Problems Classification As number of dimensions rises, # of “bins” explodes Estimate p ( � x | C i ) and use Bayes’ rule Data must be similarly scaled if idea of “distance” is to remain reasonable 13 14 Density Estimation Classification Regression Density Estimation Classification Regression Classification - Kernel Estimator Classification - k-NN Estimator k i N x t 1 � � x − � � ˆ p ( � x | C i ) = � r t p ( � ˆ x | C i ) = K N i V k ( � x ) i N i h d h t =1 where V k ( � x ) is the volume of the smallest (hyper)sphere P ( C i ) = N i ˆ containing � x and its nearest k neighbors N P ( C i ) = N i ˆ N x | C i )ˆ g i ( � x ) = p ( � ˆ P ( C i ) N x t 1 � � x − � � x | C i )ˆ p ( � ˆ P ( C i ) � r t ˆ = K ( C i | � x ) = i Nh d h ˆ p ( � x ) t =1 k i = k Assign the input to the class having the most instances among the k nearest neighbors of � x . 15 16

Density Estimation Classification Regression Density Estimation Classification Regression Condensed Nearest Neighbor Nonparametric Regression (Smoothing) Extending the idea of a histogram to regression 1-NN is easy to compute t b ( x , x t ) r t � Discriminant is piecewise g ( x ) = ˆ linear � t b ( x , x t ) But requires that we keep where the entire training set � 1 Condensed NN discards if x is in the same bin as x b ( x , x t ) = “interior” points that cannot o otherwise affect the discriminant “regressogram” Finding such consistent subsets is NP Requires approximation in practice 17 18 Density Estimation Classification Regression Density Estimation Classification Regression Bin Smoothing Running Mean Smoother Define a “bin” centered on x : � � x − x t r t � t w h g ( x ) = ˆ � x − x t � � t w h where � 1 if | u | < 1 w ( u ) = o otherwise particularly popular with evenly spaced data 19 20

Density Estimation Classification Regression Density Estimation Classification Regression Running Mean Smoothing Kernel Smoother � � x − x t r t � t K h g ( x ) = ˆ � x − x t � � t K h where K is Gaussian In this, and subsequent smoothers, can also reformulate in terms of closest k neighbors 21 22 Density Estimation Classification Regression Density Estimation Classification Regression Kernel Smoothing Running Line Smoother In running mean we took an average over all points in a bin Instead we could fit a linear regression line to all points in a bin Numerical analysis has spline techniques that smooth derivatives as well as function values 23 24

Density Estimation Classification Regression Density Estimation Classification Regression Running Line Smoothing Choosing h or k Small values exaggerate effects of single instances - high variance Larger values increases bias Cross-validation 25 26

Outline Density Estimation 1 Nonparametric Methods Bins Kernel - PowerPoint PPT Presentation

Density Estimation Classification Regression Density Estimation Classification Regression Outline Density Estimation 1 Nonparametric Methods Bins Kernel Estimators k-Nearest Neighbor Steven J Zeil Multivariate Data Old Dominion Univ.

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Imputing missing values in satellite data: From parametric to non-parametric approaches

Chapter 16 Nonparametric Statistics Introduction: Distribution-Free Tests Distribution-free

Assumptions and normal distributions EX P ERIMEN TAL DES IGN IN P YTH ON Luke Hayden

Burst detection method in wavelet domain (WaveBurst) S.Klimenko, G.Mitselmakher University of

Overview Course 02402 Introduction to Statistics 1 Introduction to simulation Example 1 Lecture

Methods for Experimental Analysis Marco Chiarandini Department of Mathematics & Computer

women and men in Africa and Asia: Lessons from ILOs School to Work Transition Survey Andy

the causes and consequences of self-employment over the life cycle . John Eric Humphries May 1,

Outline Density Estimation 1 Nonparametric Methods Bins Kernel - PowerPoint PPT Presentation

Density Estimation Classification Regression Density Estimation Classification Regression Outline Density Estimation 1 Nonparametric Methods Bins Kernel Estimators k-Nearest Neighbor Steven J Zeil Multivariate Data Old Dominion Univ.

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Imputing missing values in satellite data: From parametric to non-parametric approaches

Chapter 16 Nonparametric Statistics Introduction: Distribution-Free Tests Distribution-free

Assumptions and normal distributions EX P ERIMEN TAL DES IGN IN P YTH ON Luke Hayden

Burst detection method in wavelet domain (WaveBurst) S.Klimenko, G.Mitselmakher University of

Overview Course 02402 Introduction to Statistics 1 Introduction to simulation Example 1 Lecture

Methods for Experimental Analysis Marco Chiarandini Department of Mathematics &amp; Computer

women and men in Africa and Asia: Lessons from ILOs School to Work Transition Survey Andy

the causes and consequences of self-employment over the life cycle . John Eric Humphries May 1,

Methods for Experimental Analysis Marco Chiarandini Department of Mathematics & Computer