lecture 10 nonparametric regression 2
play

Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 - PowerPoint PPT Presentation

Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 / 18 Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Consistency of


  1. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Lecture 10: Nonparametric Regression (2) Applied Statistics 2015 1 / 18

  2. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Consistency of Nadaraya-Watson Estimator Here we consider the random design. There are n pairs of IID obser- vations ( X 1 , Y 1 ) , . . . , ( X n , Y n ) and Y i = r ( X i ) + ǫ i , i = 1 , . . . , n, where ǫ i ’s and X i ’s are independent, and E( ǫ i ) = 0 and Var( ǫ i ) = σ 2 . Recall that for chosen smoothing parameter h n and kernel K , the Nadaraya-Watson estimator of r is given by � � � n x − X i i =1 K Y i h n � . r n ( x ) = ˆ � � n x − X i i =1 K h n 2 / 18

  3. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Theorem (Consistency of Nadaraya-Watson Estimator) Let h n → 0 , nh = nh n → ∞ as n → ∞ . Let f denote the density of Y 2 � � X 1 and Let E < ∞ . Then for any x 0 for which r ( x 0 ) and f ( x 0 ) 1 are continuous and f ( x 0 ) > 0 , the Nadaraya-Watson estimator ˆ r n ( x 0 ) is a consistent estimator of r ( x 0 ) , that is, P r n ( x ) ˆ → r ( x 0 ) , as n → ∞ 3 / 18

  4. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Proof of the theorem To prove this theorem, we need to use the following result. Lemma (Theorem IA in Parzen (1962)) Suppose that w ( y ) is bounded and integrable function satisfying lim y →∞ | yw ( y ) | = 0 . Let g be an integrable function. Then for h n such that h n → 0 as n → ∞ , 1 � � u − x � � lim w g ( u ) du = g ( x ) w ( u ) du, h n h n n →∞ for every continuity point x of g . 4 / 18

  5. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Proof of the theorem In the proof, we drop the subscript of n in h n . Denote n f n ( x 0 ) = 1 � x 0 − X i � ˆ � K nh h i =1 n ψ n ( x 0 ) = 1 � x 0 − X i � ˆ � K Y i . nh h i =1 ˆ f n ( x 0 ) . Note that ˆ ψ ( x 0 ) Then ˆ r n ( x 0 ) = f n ( x 0 ) is the kernel estimator of ˆ f ( x 0 ) . It suffices to prove that ˆ → f ( x 0 ) and ˆ P P f n ( x 0 ) ψ n ( x 0 ) → r ( x 0 ) f ( x 0 ) . We will prove the latter using the lemma. The proof of the former is similar and simpler. 5 / 18

  6. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments P Proof of the theorem: ˆ ψ n ( x 0 ) → r ( x 0 ) f ( x 0 ) First we have, � IID = 1 � � x 0 − X 1 � � � ˆ E ψ n ( x 0 ) h E K Y 1 h = 1 � � x 0 − X 1 � � h E K ( r ( X 1 ) + ǫ 1 ) h 1 � � x 0 − x � E( ǫ )=0 = K r ( x ) f ( x ) dx → r ( x 0 ) f ( x 0 ) h h Note that the kernel K satisfies the conditions on w of the lemma. The last convergence follows from the lemma and the symmetry of K . Similarly we can show that � � � ˆ r 2 ( x 0 ) + σ 2 � K 2 ( u ) du. � nh Var ψ n ( x 0 ) → f ( x 0 ) � 2 � P ˆ → 0 , which implies ˆ Hence, E ψ n ( x 0 ) − r ( x 0 ) f ( x 0 ) ψ n ( x 0 ) → r ( x 0 ) f ( x 0 ) . 6 / 18

  7. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments MISE of the Nadaraya-Watson estimator Theorem 5.44 in Wasserman (2005) The mean integrated square error of the Nadaraya-Watson estimator is � 2 r n ) = h 4 �� � � � r ′′ ( x ) + 2 r ′ ( x ) f ′ ( x ) n x 2 K ( x ) dx MISE (ˆ dx 4 f ( x ) + σ 2 � � 1 K 2 ( x ) dx nh − 1 + h 4 � � f ( x ) dx + o n n nh n The first term is the squared bias. The term r ′ ( x ) f ′ ( x ) f ( x ) is called the design bias as it depends on the design, that is, the distribution of X i ’s. It is known that the kernel estimator has high bias near the boundaries of the data. This is known as boundary bias. 7 / 18

  8. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Boundary bias The blue curve is the N-W estimate and the black one is the real r ( x ) . Nadaraya−Watson (h=0.2, kernel=guassian) ● ● 1.0 ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.0 ● ● ● ● ● ● ● Y ●● ● ● −0.5 ●● ● ● ● ● ● ● ● ● ●● −1.0 ●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● −1.5 ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 x 8 / 18

  9. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Boundary bias The blue curve is the N-W estimate and the black one is the real r ( x ) . Nadaraya−Watson (h=0.2, kernel=guassian) ● ● 1.0 ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.0 ● ● ● ● ● ● ● Y ●● ● ● −0.5 ●● ● ● ● ● ● ● ● ● ●● −1.0 ●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● −1.5 ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 x To alleviate the boundary bias, the so-called local linear regression can be used. 8 / 18

  10. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Local linear regression Suppose that we want to estimate r ( x ) and X i is an observation close to x . By Taylor expansion, r ( X i ) ≈ r ( x ) + r ′ ( x )( X i − x ) =: a + b ( X i − x ) . Thus the problem of estimating r ( x ) is equivalent to estimating a ! Now, we replace r ( X i ) with Y i as we only observe Y i but not r ( X i ) . We want to find an a such that ( Y i − ( a + b ( X i − x ))) 2 is small. Take into a and ˆ account all the observations and let ˆ b be given by n � x − x i � ( Y i − ( a + b ( X i − x ))) 2 . a, ˆ � (ˆ b ) = argmin K h a,b i =1 The local linear estimator is defined as: ˜ r n ( x ) := ˆ a . Compare it with the Nadaraya-Watson estimator � x − x i � n � ( Y i − c ) 2 . ˆ r n ( x ) = argmin c ∈ R i =1 K h 9 / 18

  11. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Local linear regression � x − x i Write L ( a, b ) = � n ( Y i − ( a + b ( X i − x ))) 2 . Solving the � i =1 K h � x − x i � following equation, with k i = K and z i = X i − x , h n n n ∂L ( a, b ) � � � = a k i + b k i z i − k i Y i = 0 ∂a i =1 i =1 i =1 n n n ∂L ( a, b ) � � � k i z 2 = a k i z i + b i − k i Y i z i = 0 , ∂b i =1 i =1 i =1 a = � n i =1 w i ( x ) Y i / � n yields ˆ i =1 w i ( x ) , and thus n n � � r n ( x ) = ˜ w i ( x ) Y i / w i ( x ) i =1 i =1 �� n � � n j =1 k j z 2 where w i ( x ) = k i j − z i j =1 k j z j . 10 / 18

  12. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Local linear regression A linear smoother is defined by the following weighted average: � n i =1 l i ( x ) Y i . Clearly the local linear estimator is a linear smoother, so are the re- gressogram and the kernel estimator. Like Nadaraya-Watson estimator, ˜ r n ( x ) depends on h . We also need to choose h when using the linear estimator. The cross validation can be done in the same manner as that for N-W estimator. 11 / 18

  13. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Local linear regression: cross validation Write the local linear estimator ˜ r h = ˜ r nh . The CV score is defined as n CV ( h ) = 1 � 2 � r ( i ) � Y i − ˜ nh ( X i ) , n i =1 r ( i ) where ˜ nh ( x i ) is the estimator without using the observation ( X i , Y i ) . Again, to compute the CV score, there is no need to fit the curve n times. We have the following relation, with l i ( X i ) = w i ( X i ) / � n j =1 w j ( X i ) , n � 2 CV ( h ) = 1 � Y i − ˜ r nh ( X i ) � . n 1 − l i ( X i ) i =1 � 2 � Y i − ˜ r nh ( X i ) 1 � n Hence h cv = argmin h . i =1 n 1 − l i ( X i ) 12 / 18

  14. Consistency of Nadaraya-Watson Estimator Local linear regression Assignments Comparison Nadaraya-Watson estimator V.S. Local linear estimator Theorem 5.65 in Wasserman (2005); see also Fan (1992) Let h n → 0 and nh n → ∞ , as n → ∞ . Under some smoothing conditions on f ( x ) and r ( x ) , both ˆ r n ( x ) and ˜ r n ( x ) have variance � 1 σ 2 � � K 2 ( u ) du + o . nh n f ( x ) nh The bias of ˆ r n ( x ) is � 1 2 r ′′ ( x ) + r ′ ( x ) f ′ ( x ) � �� � h 2 u 2 K ( u ) du + o ( h 2 n ) n f ( x ) r n ( x ) has bias 1 2 h 2 u 2 K ( u ) du + o ( h 2 �� � whereas ˜ n r ′′ ( x ) n ) . At the boundary points, the NW estimator typically bears high bias due to the large absolute value of f ′ ( x ) f ( x ) . In this sense, local linear estimation eliminates boundary bias and is free from design bias. 13 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend