Lecture 8: Kernel Density Estimation (2) Applied Statistics 2015 1 - PowerPoint PPT Presentation

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Lecture 8: Kernel Density Estimation (2) Applied Statistics 2015 1 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Recap A kernel density estimator is given by n � x − X i � f n,h ( x ) = 1 ˆ � K . nh h i =1 The risk of the estimator is measured locally by MSE and globally by IMSE . Both have the following decomposition. Risk = ( Bias ) 2 + Variance = ah 4 + b nh + Remaining term . Minimizing the risk yields the optimal bandwidth h opt of order n − 1 / 5 . 2 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Recap The trade-off between bias and variance is a common issue in smoothing problems. The bias increases and the variance decreases with the amount of smoothing, which is determined by the bandwidth h in kernel density estimator. 3 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments The general concept of cross validation (CV) was introduced in Stone (1974). It was not first suggested for density estimation. The basic idea of CV is very intuitive. Select a part of the data to fit the model. Then apply the fitted model to the rest of the data to assess goodness of fit. For choosing bandwidth in density estimator, the procedure works as fol- lows. Fix h Obtain the estimator based on ( n − 1) observations { X 1 , . . . , X j − 1 , X j +1 , . . . , X n } . Denote by 1 � x − X i � ˆ f ( j ) � n,h ( x ) = K . ( n − 1) h h i � = j A CV score, as a measure of GoF, is computed based on { ˆ f ( j ) n,h ( X j ) , j = 1 , . . . , n } . Varying h , a function CV ( h ) will be formed and then maximized (or minimized) to obtain a CV bandwidth. 4 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Maximum Likelihood CV Let 1 � x − X i � f ( j ) ˆ � n,h ( x ) = K . ( n − 1) h h i � = j be the estimated density based on the sample values excepted X j . We f ( j ) f ( j ) apply the estimate ˆ n,h ( x ) to x = X j to obtain ˆ n,h ( X j ) . Since X j was actually observed, a good choice of h should give large value of ˆ f ( j ) n,h ( X j ) . The rationale is similar to that of MLE. Define the CV likelihood as n f ( j ) ˆ � ˆ L ( h ) = n,h ( X j ) . j =1 The maximum likelihood CV (MLCV) bandwidth is given by h ML = argmax h ˆ L ( h ) . 5 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Maximum Likelihood CV It can be proven that under some conditions of f and K , � a.s | ˆ f n,h ML ( x ) − f ( x ) | dx → 0 . Remark. There are known examples of inconsistency of ˆ f n,h ML , if f has unbounded support. 6 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Least squares CV Consider �� MISE ( ˆ ( ˆ f n,h ( x ) − f ( x )) 2 dx f n,h ) = E �� ˆ ˆ f 2 ( f ( x )) 2 dx. =E n,h ( x ) dx − 2E f n,h ( x ) f ( x ) dx + The last term does not depend on h . Thus we aim to find a good h �� ˆ �� ˆ � � f 2 that minimizes M ( h ) = E n,h ( x ) dx − 2E f n,h ( x ) f ( x ) dx . However M ( h ) depends on the unknown f . We shall find an unbiased estimator of M ( h ) . �� ˆ � We only need to find an unbiased estimator of E f n,h ( x ) f ( x ) dx . 7 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Least squares CV It turns out that n 1 f ( j ) � ˆ n,h ( X j ) n j =1 �� ˆ � is an unbiased estimator of E f n,h ( x ) f ( x ) dx :   n n  1  = 1 � � � � � f ( j ) ˆ � f ( j ) ˆ f (1) ˆ E n,h ( X j ) E n,h ( X j ) = E n,h ( X 1 ) n n j =1 j =1 � n �� X 1 − X i � � X 1 − X 2 �� 1 = 1 � =E K h E K ( n − 1) h h h i =2 � �� 1 = 1 � � � x − y � � x − y � � K f ( y ) f ( x ) dydx = hK f ( y ) dy f ( x ) dx h h h � �� ˆ ˆ = E f n,h ( x ) f ( x ) dx = E f n,h ( x ) f ( x ) dx . 8 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Least squares CV � ˆ � n j =1 ˆ f ( j ) n,h ( x ) dx − 2 f 2 Let LSCV ( h ) = n,h ( X j ) . We have shown n that for any h > 0 , E( LSCV ( h )) = M ( h ) . LSCV ( h ) is the least squares cross validation score . The LSCV bandwidth is defined as h ls = argmin h LSCV ( h ) . For a given h , LSCV ( h ) can be computed from the sample. A computanional formula for LSCV(h): n n n � X i − X j � � X i − X j � 1 � 2 � � � � K ( y ) K − y dy − K n 2 h h n ( n − 1) h h i =1 j =1 j =1 i � = j 9 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Least squares CV The resulting bandwidth h ls and thus the density estimator ˆ f n,h ls ( x ) are asymptotically optimal. Theorem (Stone 1984) Assume the following: (a) f is uniformly bounded. (b) K is a kernel (so a density symmetric around zero) with zero the unique-mode. (c) K is compactly supported. (d) K is Holder continuous of order β ; i.e. for x 1 , x 2 ∈ R , | K ( x 1 ) − k ( x 2 ) | ≤ constant | x 1 − x 2 | β . Then, ( ˆ � f n,h ls ( x ) − f ( x )) 2 dx a.s − 1 → 0 . ( ˆ � f n,h opt 2 ( x ) − f ( x )) 2 dx Remark. This result is regarded as a landmark in the cross-validation literature. The theorem asserts optimal performance of the LSCV without pratically any condition on f . 10 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments A few comments All the methods for choosing smoothing parameter h should be used with common sense. Recommended methods: reference bandwidth and cross validation approaches. In practice, always make plots and compare different choices of h . 11 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Multivariate density estimators 1 On the basis of n i.i.d. random vectors X i = ( X i 1 , . . . , X id ) from unknown F , we wish to estimate f , the density of F . We consider d -dimensional kernel estimators, for x = ( x 1 , . . . , x d ) ∈ R d , n 1 � x − X i � ˆ � f n ( x ) = K . nh d h i =1 where the kernel K is a d -dimensional density. In practice, K is often taken to be product kernel or an ellipsoidal kernel. 1 A bold letter denotes a vector in this section. 12 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Multivariate density estimators Product kernel: K ( x ) = � d i =1 K 0 ( x i ) , with K 0 a univariate kernel. Ellipsoidal kernel Multivariate normal density function (2 π ) − d/ 2 exp − 1 2 xx ′ � � . Multivariate Epanechnikov kernel d +2 2 c d (1 − xx ′ )1 [ − 1 , 1] ( xx ′ ) , where c d is the volume of d -dimensional unit ball: c 1 =1, c 2 = π , c 3 = 4 π/ 3 . One can also choose different amounts of smoothing along different directions: h = ( h 1 , . . . , h d ) . 13 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Multivariate density estimators Assume that h i → 0 , for i = 1 , . . . , d and n � d i =1 h d i → ∞ as n → ∞ . Under some smoothing conditions of f and K , ˆ f n ( x ) is a consistent estimator of f ( x ) : P ˆ f n ( x ) → f ( x ) . The optimal bandwidth, h i opt is c i n − 1 / ( d +4) , i = 1 , . . . , d and the corresponding risk (MSE or MISE) tend to zero at the rate of n − 4 / ( d +4) . 14 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Curse of dimensionality It refers to the situation that (estimation) problem gets harder very quickly as the dimension of the data increases. This can be due to computational burden and/or statistical efficiency. We discuss here the statistical curse of dimensionality: to obtain an accurate estimator, enormous sample size is required. MSE h opt ≈ cn − 4 / ( d +4) . Set MSE h opt = δ and solve for n : � c � d/ 4 n ≈ , δ grows exponentially with dimension d . Following we illustrate this phenomenon with two examples. 15 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Curse of dimensionality 1st Example Suppose that the data are multivariate Gaussianb N ( 0 , I d ) , with I d identity matrix. Choose the optimal h and Gaussian kernel to estimate f ( 0 ) . To achieve � 2 � ˆ E f n ( 0 ) − f ( 0 ) < 0 . 1 , f 2 ( 0 ) the number of observations n required are as in the following table (Table 4.2 of Silverman (1986)). d 2 4 6 8 10 n 19 223 2790 43,700 842,000 16 / 20

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Curse of dimensionality Why an accurate estimator requires large sample size in multivariate case? The reason is that f ( x ) is estimated using data points in a local neighbor- hood of x . But in high dimensional setting, the data are very sparse, so local neighborhoods contain very few points. 2nd Example Suppose that we have n data points uniformly distributed on the interval [0 , 1] . How many data points will be in the interval [0 , 0 . 1] ? The answer is around n/ 10 . Suppose n data points uniformly distributed on the 10-dimensional unit cube [0 , 1] 10 = [0 , 1] × · · · [0 , 1] . How many data points will be in the cube [0 , 0 . 1] 10 ? About n 0 . 1 10 n = 10 , 000 , 000 , 000 17 / 20

Lecture 8: Kernel Density Estimation (2) Applied Statistics 2015 1 - PowerPoint PPT Presentation

Choice of bandwidth by Cross Validation Multivariate density estimators Assignments Lecture 8: Kernel Density Estimation (2) Applied Statistics 2015 1 / 20 Choice of bandwidth by Cross Validation Multivariate density estimators Assignments

Lecture 7: Kernel Density Estimation Applied Statistics 2015 1 / 20 Kernel Density Estimator

Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5 Kernel Density Estimation

Outline Density Estimation 1 Nonparametric Methods Bins Kernel Estimators k-Nearest Neighbor

Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Density Estimation

Relative Density Chapters 3.5 Relative Density 1 2/5/2015 Minimum Density Pluviate soil from

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint Xia Hong 1 , Sheng Chen 2

Nonparametric density estimation Christopher F Baum EC 823: Applied Econometrics Boston College,

Nonparametric density estimation Christopher F Baum ECON 8823: Applied Econometrics Boston

Density Ratio Estimation Density Ratio Estimation in Machine Learning in Machine Learning

Polyethylene Monomer: Ethylene High Density Polyethylene (HDPE) Low Density Polyethylene

Bulk Density and Void Content Bulk Density Bulk density ( n .) the mass of a unit volume of bulk

Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Non-parametric Density Estimation on a Transformation Group for Vision Erik G. Miller, UC

Deductive Derivation and Computerization of Semiparametric Efficient Estimation Constantine

Introduction to statistics: Linear mixed models Shravan Vasishth Universit at Potsdam

Exclusion Bias in the Estimation of Peer Effects Bet Caeyers (Institute for Fiscal Studies,

An Update from the PQRI Stability Shelf Life Working Group Presented by: James Schwenke

Nonparametric Estimation in Panel Data Models with Heterogeneity and TimeVaryingness Jiti Gao

Unemployment in Recession Christopher A Pissarides London School of Economics The Royal Economic

How to get started in L A T EX Document Structure Formatting and Page Layout Florence Bouvet

Sambuz

Useful Links

Newsletter

Mail Us