lectures 2 and 3 goodness of fit
play

Lectures 2 and 3: Goodness of Fit Applied Statistics 2014 1 / 36 - PowerPoint PPT Presentation

GoF testing EDF tests Chi-square tests Probability plotting Assignment Lectures 2 and 3: Goodness of Fit Applied Statistics 2014 1 / 36 GoF testing EDF tests Chi-square tests Probability plotting Assignment Goodness of Fit (GoF) testing


  1. GoF testing EDF tests Chi-square tests Probability plotting Assignment Lectures 2 and 3: Goodness of Fit Applied Statistics 2014 1 / 36

  2. GoF testing EDF tests Chi-square tests Probability plotting Assignment Goodness of Fit (GoF) testing Let the observations X 1 , . . . , X n be i.i.d from some unknown distribution F . 1 Simple GOF H 0 : F = F 0 H 1 : F � = F 0 . F 0 completely specified. 2 Composite GOF H 0 : F ∈ F H 1 : F �∈ F . F some parametric class of distribution functions. 2 / 36

  3. GoF testing EDF tests Chi-square tests Probability plotting Assignment Empirical distribution function (EDF) based tests It is used for continuous distributions. It is typically more powerful than chi-square tests. Chi-square tests It is applicable to both continuous and discrete random variables. It is also useful for random vectors, so multi-dimensional. Probability plotting – graphical tool 3 / 36

  4. GoF testing EDF tests Chi-square tests Probability plotting Assignment Simple GoF: EDF-based statistics Under H 0 , we have (by Glivenko-Cantelli theorem), a.s. t | ˆ sup F n ( t ) − F 0 ( t ) | → 0 , n → ∞ . Any discrepancy measure between ˆ F n and F 0 serves as a test statistic. Well known test statistics for continuous F 0 | ˆ D n = sup F n ( x ) − F 0 ( x ) | Kolmogorov-Smirnov x ∈ R � ∞ ( ˆ F n ( x ) − F 0 ( x )) 2 dF 0 ( x ) C n = n Cram´ er-Von-Mises −∞ � ∞ ( ˆ F n ( x ) − F 0 ( x )) 2 A n = n F 0 ( x )(1 − F 0 ( x )) dF 0 ( x ) −∞ � ˆ � 2 � ∞ F n ( x ) − E 0 ˆ F n ( x ) = dF 0 ( x ) Anderson-Darling sd 0 ˆ F n ( x ) −∞ 4 / 36

  5. GoF testing EDF tests Chi-square tests Probability plotting Assignment Simple GoF: Computational formulae Put U ( i ) = F 0 ( X ( i ) ) . Then, we have, 1 ≤ i ≤ n max {| ˆ F n ( X ( i ) ) − F 0 ( X ( i ) ) | , | ˆ D n = max F n ( X ( i ) − ) − F 0 ( X ( i ) ) |} = 1 ≤ i ≤ n max {| i/n − U ( i ) | , | U ( i ) − ( i − 1) /n |} ; max n � 2 1 � U ( i ) − 2 i − 1 � C n = 12 n + ; 2 n i =1 n − n − 1 � A n = (2 i − 1)[log U ( i ) + log(1 − U ( n − i +1) )] . n i =1 5 / 36

  6. GoF testing EDF tests Chi-square tests Probability plotting Assignment Simple GoF: Distribution of test statistic under H 0 (1) Probability integral transform If X has a continuous distribution F 0 , then F 0 ( X ) has a uniform distribu- tion on [0 , 1] , i.e. UN [0 , 1] . Under H 0 , { U ( i ) , 1 ≤ i ≤ n } are order statistics from UN [0 , 1] . Corollary Under H 0 , the distributions of D n , C n and A n are all independent of F 0 . Thus, the tests are distribution free. To compute the critical value, Small n : tables. Large n : asymptotics 6 / 36

  7. GoF testing EDF tests Chi-square tests Probability plotting Assignment Simple GoF: Distribution of test statistic under H 0 (2) Under H 0 , n U n ( t ) = 1 d | ˆ � sup F n ( t ) − F 0 ( t ) | = sup | U n ( t ) − t | where 1 { U i ≤ t } , n t ∈ R t ∈ [0 , 1] 1 with U i i.i.d from UN [0 , 1] . Theorem (Donsker) {√ n ( U n ( t ) − t ) } t ∈ [0 , 1] converges in distribution (in the space D [0 , 1]) to a standard Brownian Bridge B 0 , which is a Gaussian process with E ( B 0 ( t )) = 0 and E ( B 0 ( t 1 ) B 0 ( t 2 )) = t 1 (1 − t 2 ) , for 0 ≤ t 1 ≤ t 2 ≤ 1 . This implies (by continuous-mapping theorem), under H 0 , √ nD n d → sup | B 0 ( t ) | . t ∈ [0 , 1] and � 1 � 1 B 2 0 ( t ) d d B 2 C n → 0 ( t ) dt and D n → t (1 − t ) dt. 0 0 7 / 36

  8. GoF testing EDF tests Chi-square tests Probability plotting Assignment Simple GoF: Consistency under alternatives Let G n ( t ) = P F 0 ( √ nD n ≤ t ) . Reject H 0 : F = F 0 (with significance α ) if √ nD n > G − 1 n (1 − α ) , with 0 < α < 1 . Lemma (consistency under any alternative) If the data { X i } n i =1 comes from a distribution F � = F 0 then P F ( √ nD n > G − 1 n (1 − α )) → 1 , as n → ∞ . 8 / 36

  9. GoF testing EDF tests Chi-square tests Probability plotting Assignment Some comments The KS test, D n , is probably most well known. However it is often (much) less powerful than the quadratic statistics C n and A n . A n behaves similarly to C n , but is more powerful when F 0 departs from the true (underlying) distribution in the tails. EDF statistics are usually more powerful than the Pearson chi-square statistics, which we shall discuss later on. 9 / 36

  10. GoF testing EDF tests Chi-square tests Probability plotting Assignment Composite tests: location-scale families � � x − θ 1 A location-scale family of distributions: F θ ( x ) = H where H θ 2 is some known distribution function and θ = ( θ 1 , θ 2 ) , with location parameter θ 1 ∈ R and the scale parameter θ 2 > 0 . H 0 : F = F θ , for some θ ∈ Θ ⊆ R × (0 , ∞ ) . General idea: compare ˆ θ n , where ˆ F n and F ˆ θ n is an efficient estimator of θ . Theorem (Antle and Bain 1969): Let (ˆ θ 1 , ˆ θ 2 ) be the MLE of ( θ 1 , θ 2 ) . Then the quantities ˆ θ 2 /θ 2 , (ˆ θ 1 − θ 1 ) /θ 2 and (ˆ θ 1 − θ 1 ) / ˆ θ 2 are each distributed independently of θ 1 and θ 2 . Replacing F 0 with F ˆ θ in the definition of D n , C n and A n , we obtain the (EDF) test statistics for the composite test. The distribution of the obtained test statistics, under H 0 , depends on the d.f. H . The critical values are often computed via simulations. 10 / 36

  11. GoF testing EDF tests Chi-square tests Probability plotting Assignment Composite tests for normality H 0 : F 0 is a normal distribution function. Example: AD-test statistic for normality � ( ˆ � t − ¯ F n ( t ) − Φ n ( t )) 2 X n � A n = n Φ n ( t )(1 − Φ n ( t )) d Φ n ( t ) Φ n ( t ) = Φ . S n Computational formula n A n = − n − 1 � (2 i − 1)[log U ( i ) + log(1 − U ( n − i +1) )] . n i =1 with � X ( i ) − ¯ X n � U ( i ) = Φ . S n Similarly: Cr´ amer - Von Mises C n and KS-test D n (known as Lilliefors test). 11 / 36

  12. GoF testing EDF tests Chi-square tests Probability plotting Assignment Other specialized tests for assessing normality Jarque-Bera test (1980). � n 1 ( X i − ¯ 1 X n ) 3 n emp. skewness b 1 = S 3 � n 1 ( X i − ¯ 1 X n ) 4 n emp. kurtosis b 2 = S 4 Now √ nb 1 → N (0 , 6) and √ n ( b 2 − 3) w w − − → N (0 , 24) . The Jarque-Bera statistic is given by � b 2 6 + ( b 2 − 3) 2 � 1 JB = n . 24 12 / 36

  13. GoF testing EDF tests Chi-square tests Probability plotting Assignment Other specialized tests for assessing normality Shapiro-Wilk Test statistic � 2 �� n i =1 a i X ( i ) W = ( ∈ (0 , 1]) , � n i =1 ( X i − ¯ X n ) 2 for certain a 1 , . . . , a n (fixed). Under H 0 , the numerator is an estimator for a multiple of σ 2 , the denominator is an estimator for ( n − 1) σ 2 . Under H 0 , W ≈ 1 . Under H 1 , the numerator is usually smaller. 13 / 36

  14. GoF testing EDF tests Chi-square tests Probability plotting Assignment Tests for normality in R In R: ks.test(x,"pnorm") # can also be used # for other parametric families shapiro.test(x) library(nortest) ad.test(x) cvm.test(x) library(tseries) jarque.bera.test(x) These are tests for a composite null hypothesis. Simulation studies: Anderson-Darling test preferable. 14 / 36

  15. GoF testing EDF tests Chi-square tests Probability plotting Assignment A simulation study to assess the performance of tests for normality. We compute the fraction of times that the null hypothesis of normality is rejected for a number of distributions (in total we simulated 1000 times). Results for n= 20 norm cauchy exp t5 t10 t15 Shapiro 0.054 0.852 0.852 0.182 0.095 0.080 KS 0.038 0.206 1.000 0.067 0.050 0.046 AD 0.043 0.863 0.799 0.166 0.092 0.074 CvM 0.050 0.864 0.751 0.157 0.081 0.070 JB 0.025 0.807 0.516 0.162 0.067 0.060 15 / 36

  16. GoF testing EDF tests Chi-square tests Probability plotting Assignment A simulation study... Results for n= 50 norm cauchy exp t5 t10 t15 Shapiro 0.065 0.994 1.000 0.360 0.152 0.100 KS 0.062 0.472 1.000 0.066 0.045 0.054 AD 0.055 0.994 0.998 0.289 0.123 0.073 CvM 0.055 0.738 0.989 0.249 0.113 0.070 JB 0.043 0.993 0.953 0.396 0.172 0.106 16 / 36

  17. GoF testing EDF tests Chi-square tests Probability plotting Assignment A simulation study... Results for n= 200 norm cauchy exp t5 t10 t15 Shapiro 0.054 1.000 1.000 0.825 0.362 0.223 KS 0.044 0.999 1.000 0.084 0.058 0.047 AD 0.052 NA NA NA 0.258 0.136 CvM 0.044 0.003 0.981 0.689 0.213 0.107 JB 0.049 1.000 1.000 0.869 0.436 0.291 17 / 36

  18. GoF testing EDF tests Chi-square tests Probability plotting Assignment A simulation study... Results for n= 5000 norm cauchy exp t5 t10 t15 Shapiro 0.056 1 1 1.000 1.000 0.997 KS 0.047 1 1 1.000 0.693 0.205 AD 0.058 NA NA NA NA 0.989 CvM 0.061 1 1 1.000 1.000 0.962 JB 0.057 1 1 1.000 1.000 1.000 18 / 36

  19. GoF testing EDF tests Chi-square tests Probability plotting Assignment Simple GoF: Chi-square type tests H 0 : F = F 0 H 1 : F � = F 0 . Let S = supp ( F 0 ) . Fix a positive integer k . Let k � S = A k,i i =1 be a partition of S . Define N i := # { j : X j ∈ A k,i } . Under H 0 : e i := E 0 N i = n P 0 ( X ∈ A k,i ) and we expect e i and N i to be close. 19 / 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend