recall the basics of hypothesis testing
play

Recall the Basics of Hypothesis Testing The level of significance , - PowerPoint PPT Presentation

Recall the Basics of Hypothesis Testing The level of significance , ( size of test ) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P ( X w | H 0 ) = . H 0 TRUE H 1 TRUE Acceptance Contamination X


  1. Recall the Basics of Hypothesis Testing The level of significance α , ( size of test ) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P ( X ∈ w | H 0 ) = α . H 0 TRUE H 1 TRUE Acceptance Contamination X �∈ w ACCEPT good Error of the H 0 second kind Prob = 1 − α Prob = β Loss Rejection X ∈ w REJECT Error of the good (critical H 0 first kind region) Prob = α Prob = 1 − β F. James (CERN) Statistics for Physicists, 5: Goodness-of-Fit April 2012, DESY 1 / 34

  2. Goodness-of-Fit Testing Goodness-of-Fit Testing (GOF) As in hypothesis testing, we are again concerned with the test of a null hypothesis H 0 with a test statistic T , in a critical region w α , at a significance level α . Unlike the previous situations, however, the alternative hypothesis, H 1 is now the set of all possible alternatives to H 0 . Thus H 1 cannot be formulated, the risk of second kind, β , is unknown, and the power of the test is undefined. Since it is in general impossible to know whether one test is more powerful than another, the theoretical basis for goodness-of-fit (GOF) testing is much less satisfactory than the basis for classical hypothesis testing. Nevertheless, GOF testing is quantitatively the most successful area of statistics. In particular, Pearson’s venerable Chi-square test is the most heavily used method in all of statistics. F. James (CERN) Statistics for Physicists, 5: Goodness-of-Fit April 2012, DESY 2 / 34

  3. Goodness-of-Fit Testing GOF Testing: From the test statistic to the P-value. GOF Testing: From the test statistic to the P-value. Goodness-of-fit tests compare the experimental data with their p.d.f. under the null hypothesis H 0 , leading to the statement: If H 0 were true and the experiment were repeated many times, one would obtain data as far away (or further) from H 0 as the observed data with probability P. The quantity P is then called the P-value of the test for this data set and hypothesis. A small value of P is taken as evidence against H 0 , which the physicist calls a bad fit. F. James (CERN) Statistics for Physicists, 5: Goodness-of-Fit April 2012, DESY 3 / 34

  4. Goodness-of-Fit Testing GOF Testing: From the test statistic to the P-value. From the test statistic to the P-value. It is clear from the above that in order to construct a GOF test we need: 1. A test statistic, that is a function of the data and of H 0 , which is a measure of the “distance” between the data and the hypothesis, and 2. A way to calculate the probability of exceeding the observed value of the test statistic for H 0 . That is, a function to map the value of the test statistic into a P-value. If the data X are discrete and our test statistic is t = t ( X ) which takes on the value t 0 = t ( X 0 ) for the data X 0 , the P-value would be given by: � P X = P ( X | H 0 ) , X : t ≥ t 0 where the sum is taken over all values of X for which t ( X ) ≥ t 0 . F. James (CERN) Statistics for Physicists, 5: Goodness-of-Fit April 2012, DESY 4 / 34

  5. Goodness-of-Fit Testing GOF Testing: From the test statistic to the P-value. Example: Test of Poisson counting rate Example of discrete counting data: We have recorded 12 counts in one year, and we wish to know if this is compatible with the theory which predicts µ = 17 . 3 counts per year. The obvious test statistic is the absolute difference | N − µ | , and assuming that the probability of n decays is given by the Poisson distribution, we can calculate the P-value by taking the sum in the previous slide. 12 ∞ e − µ µ n e − 17 . 3 17 . 3 n e − 17 . 3 17 . 3 n � � � P 12 = = + n ! n ! n ! n =0 n =23 n : | n − µ |≥ 5 . 3 Evaluating the above P-value, we get P 12 = 0 . 229. The interpretation is that the observation is not significantly different from the expected value, since one should observe a number of counts at least as far from the expected value about 23% of the time. F. James (CERN) Statistics for Physicists, 5: Goodness-of-Fit April 2012, DESY 5 / 34

  6. Goodness-of-Fit Testing GOF Testing: From the test statistic to the P-value. Poisson Test Example seen visually The length of the vertical bars is proportional to the Poisson probability of n (the x-axis) for H 0 : µ = 17 . 3. probability p 10 20 30 40 n obs H 0 F. James (CERN) Statistics for Physicists, 5: Goodness-of-Fit April 2012, DESY 6 / 34

  7. Goodness-of-Fit Testing GOF Testing: From the test statistic to the P-value. Distribution-free Tests When the data are continuous, the sum becomes an integral: � P X = P ( X | H 0 ) , (1) X : t > t 0 and this can become quite complicated to compute, so that one tries to avoid using this form. Instead, one looks for a test statistic such that the distribution of t is known independently of H 0 . Such a test is called a distribution-free test. We consider here mainly distribution-free tests, such that the P-value does not depend on the details of the hypothesis H 0 , but only on the value of t , and possibly one or two integers such as the number of events, the number of bins in a histogram, or the number of constraints in a fit. Then the mapping from t to P-value can be calculated once for all and published in tables, of which the well-known χ 2 tables are an example. F. James (CERN) Statistics for Physicists, 5: Goodness-of-Fit April 2012, DESY 7 / 34

  8. Goodness-of-Fit Testing Pearson’s Chi-square Test Pearson’s Chi-square Test An obvious way to measure the distance between the data and the hypothesis H 0 is to 1. Determine the expectation of the data under the hypothesis H 0 . 2. Find a metric in the space of the data to measure the distance of the observed data from its expectation under H 0 . When the data consists of measurements Y = Y 1 , Y 2 , . . . , Y k of quantities V , which, under H 0 are equal to f = f 1 , f 2 , . . . , f k with covariance matrix ∼ the distance between the data and H 0 is clearly: V − 1 ( Y − f ) T = ( Y − f ) T ∼ This is just the Pearson test statistic usually called chi-square, because it is distributed as χ 2 ( k ) under H 0 if the measurements Y are Gaussian-distributed. That means the P-value may be found from a table of χ 2 ( k ), or by calling PROB(T,k). F. James (CERN) Statistics for Physicists, 5: Goodness-of-Fit April 2012, DESY 8 / 34

  9. Goodness-of-Fit Testing Tests on Histograms Pearson’s Chi-square test for histograms Karl Pearson made use of the asymptotic Normality of a multinomial p.d.f. in order to find the (asymptotic) distribution of: T = ( n − N p ) T ∼ V − 1 ( n − N p ) V is the covariance matrix of the observations (bin contents) n and where ∼ N is the total number of events in the histogram. In the usual case where the bins are independent, we have k k n 2 ( n i − Np i ) 2 T = 1 = 1 � � i − N . N p i N p i i =1 i =1 This is the usual χ 2 goodness-of-fit test for histograms. The distribution of T is generally accepted as close enough to χ 2 ( k − 1) when all the expected numbers of events per bin ( Np i ) are greater than 5. Cochran relaxes this restriction, claiming the approximation to be good if not more than 20% of the bins have expectations between 1 and 5. F. James (CERN) Statistics for Physicists, 5: Goodness-of-Fit April 2012, DESY 9 / 34

  10. Goodness-of-Fit Testing Tests on Histograms Chi-square test with estimation of parameters If the parent distribution depends on a vector of parameters θ = θ 1 , θ 2 , . . . , θ r , to be estimated from the data, one does not expect the T statistic to behave as a χ 2 ( k − 1), except in the limiting case where the r parameters do not actually affect the goodness of fit. In the more general and usual case, one can show that when r parameters are estimated from the same data, the cumulative distribution of T is intermediate between a χ 2 ( k − 1) (which holds when θ is fixed) and a χ 2 ( k − r − 1) (which holds for an optimal estimation method), always assuming the null hypothesis. The test is no longer distribution-free, but when k is large and r small, the two boundaries χ 2 ( k − 1) and χ 2 ( k − r − 1) become close enough to make the test practically distribution-free. In practice, the r parameters will be well chosen, and T will usually behave as χ 2 ( k − r − 1) . F. James (CERN) Statistics for Physicists, 5: Goodness-of-Fit April 2012, DESY 10 / 34

  11. Goodness-of-Fit Testing Tests on Histograms Binned Likelihood (Likelihood Chi-square) Pearson’s Chi-square is a good test statistic when fitting a curve to points with Gaussian errors, but for fitting histograms, we can make use of the known distribution of events in a bin, which is not exactly Gaussian: ◮ It is Poisson-distributed if the bin contents are independent (no constraint on the total number of events). ◮ Or it is Multinomial-distributed if the total number of events in the histogram is fixed. reference: Baker and Cousins, Clarification of the Use of Chi-square and Likelihood Functions in Fits to Histograms NIM 221 (1984) 437 Our test statistic will be the binned likelihood, which Baker and Cousins called the likelihood chi-square because it behaves as a χ 2 , although it is derived as a likelihood ratio. F. James (CERN) Statistics for Physicists, 5: Goodness-of-Fit April 2012, DESY 11 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend