Marc Mehlman
Goodness of Fit Tests
Marc H. Mehlman
marcmehlman@yahoo.com
University of New Haven
Marc Mehlman (University of New Haven) Goodness of Fit Tests 1 / 38
Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com - - PowerPoint PPT Presentation
Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc Mehlman Marc Mehlman (University of New Haven) Goodness of Fit Tests 1 / 38 Table of Contents Goodness of Fit ChiSquared Test 1 Tests of
Marc Mehlman
University of New Haven
Marc Mehlman (University of New Haven) Goodness of Fit Tests 1 / 38
Marc Mehlman
1
2
3
4
Marc Mehlman (University of New Haven) Goodness of Fit Tests 2 / 38
Marc Mehlman
Goodness of Fit Chi–Squared Test
Marc Mehlman (University of New Haven) Goodness of Fit Tests 3 / 38
Marc Mehlman
Goodness of Fit Chi–Squared Test
The chi-square (χ2) test is used when the data are categorical. It measures how different the observed data are from what we would expect if H0 was true.
0% 5% 10% 15% 20% Mon. Tue. Wed. Thu. Fri. Sat. Sun. Sample composition 0% 5% 10% 15% 20% Mon. Tue. Wed. Thu. Fri. Sat. Sun. Expected composition
Observed sample proportions (1 SRS of 700 births) Expected proportions under H0: p1=p2=p3=p4=p5=p6=p7=1/7 Marc Mehlman (University of New Haven) Goodness of Fit Tests 4 / 38
Marc Mehlman
Goodness of Fit Chi–Squared Test
Published tables & software give the upper-tail area for critical values of many χ2 distributions.
The χ2 distributions are a family of distributions that take only positive values, are skewed to the right, and are described by a specific degrees of freedom.
Marc Mehlman (University of New Haven) Goodness of Fit Tests 5 / 38
Marc Mehlman
Goodness of Fit Chi–Squared Test
p df 0.25 0.2 0.15 0.1 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005 1 1.32 1.64 2.07 2.71 3.84 5.02 5.41 6.63 7.88 9.14 10.83 12.12 2 2.77 3.22 3.79 4.61 5.99 7.38 7.82 9.21 10.60 11.98 13.82 15.20 3 4.11 4.64 5.32 6.25 7.81 9.35 9.84 11.34 12.84 14.32 16.27 17.73 4 5.39 5.99 6.74 7.78 9.49 11.14 11.67 13.28 14.86 16.42 18.47 20.00 5 6.63 7.29 8.12 9.24 11.07 12.83 13.39 15.09 16.75 18.39 20.51 22.11 6 7.84 8.56 9.45 10.64 12.59 14.45 15.03 16.81 18.55 20.25 22.46 24.10 7 9.04 9.80 10.75 12.02 14.07 16.01 16.62 18.48 20.28 22.04 24.32 26.02 8 10.22 11.03 12.03 13.36 15.51 17.53 18.17 20.09 21.95 23.77 26.12 27.87 9 11.39 12.24 13.29 14.68 16.92 19.02 19.68 21.67 23.59 25.46 27.88 29.67 10 12.55 13.44 14.53 15.99 18.31 20.48 21.16 23.21 25.19 27.11 29.59 31.42 11 13.70 14.63 15.77 17.28 19.68 21.92 22.62 24.72 26.76 28.73 31.26 33.14 12 14.85 15.81 16.99 18.55 21.03 23.34 24.05 26.22 28.30 30.32 32.91 34.82 13 15.98 16.98 18.20 19.81 22.36 24.74 25.47 27.69 29.82 31.88 34.53 36.48 14 17.12 18.15 19.41 21.06 23.68 26.12 26.87 29.14 31.32 33.43 36.12 38.11 15 18.25 19.31 20.60 22.31 25.00 27.49 28.26 30.58 32.80 34.95 37.70 39.72 16 19.37 20.47 21.79 23.54 26.30 28.85 29.63 32.00 34.27 36.46 39.25 41.31 17 20.49 21.61 22.98 24.77 27.59 30.19 31.00 33.41 35.72 37.95 40.79 42.88 18 21.60 22.76 24.16 25.99 28.87 31.53 32.35 34.81 37.16 39.42 42.31 44.43 19 22.72 23.90 25.33 27.20 30.14 32.85 33.69 36.19 38.58 40.88 43.82 45.97 20 23.83 25.04 26.50 28.41 31.41 34.17 35.02 37.57 40.00 42.34 45.31 47.50 21 24.93 26.17 27.66 29.62 32.67 35.48 36.34 38.93 41.40 43.78 46.80 49.01 22 26.04 27.30 28.82 30.81 33.92 36.78 37.66 40.29 42.80 45.20 48.27 50.51 23 27.14 28.43 29.98 32.01 35.17 38.08 38.97 41.64 44.18 46.62 49.73 52.00 24 28.24 29.55 31.13 33.20 36.42 39.36 40.27 42.98 45.56 48.03 51.18 53.48 25 29.34 30.68 32.28 34.38 37.65 40.65 41.57 44.31 46.93 49.44 52.62 54.95 26 30.43 31.79 33.43 35.56 38.89 41.92 42.86 45.64 48.29 50.83 54.05 56.41 27 31.53 32.91 34.57 36.74 40.11 43.19 44.14 46.96 49.64 52.22 55.48 57.86 28 32.62 34.03 35.71 37.92 41.34 44.46 45.42 48.28 50.99 53.59 56.89 59.30 29 33.71 35.14 36.85 39.09 42.56 45.72 46.69 49.59 52.34 54.97 58.30 60.73 30 34.80 36.25 37.99 40.26 43.77 46.98 47.96 50.89 53.67 56.33 59.70 62.16 40 45.62 47.27 49.24 51.81 55.76 59.34 60.44 63.69 66.77 69.70 73.40 76.09 50 56.33 58.16 60.35 63.17 67.50 71.42 72.61 76.15 79.49 82.66 86.66 89.56 60 66.98 68.97 71.34 74.40 79.08 83.30 84.58 88.38 91.95 95.34 99.61 102.70 80 88.13 90.41 93.11 96.58 101.90 106.60 108.10 112.30 116.30 120.10 124.80 128.30 100 109.10 111.70 114.70 118.50 124.30 129.60 131.10 135.80 140.20 144.30 149.40 153.20
Ex: df = 6
If χ2 = 15.9 the P-value is between 0.01 −0.02.
Marc Mehlman (University of New Haven) Goodness of Fit Tests 6 / 38
Marc Mehlman
Goodness of Fit Chi–Squared Test
def
def
Marc Mehlman (University of New Haven) Goodness of Fit Tests 7 / 38
Marc Mehlman
Goodness of Fit Chi–Squared Test
def
k
Marc Mehlman (University of New Haven) Goodness of Fit Tests 8 / 38
Marc Mehlman
Goodness of Fit Chi–Squared Test
River ecology Three species of large fish (A, B, C) that are native to a certain river have been
A recent random sample of 300 large fish found 89 of species A, 120 of species B, and 91 of species C. Do the data provide evidence that the river’s ecosystem has been upset? H0: pA = pB = pC = 1/3 Ha: H0 is not true Number of proportions compared: k = 3 All the expected counts are : n / k = 300 / 3 = 100 Degrees of freedom: (k – 1) = 3 – 1 = 2 ( ) ( ) ( ) 02 . 6 81 . . 4 21 . 1 100 100 91 100 100 120 100 100 89
2 2 2 2
= + + = − + − + − = χ X2 calculations:
Marc Mehlman (University of New Haven) Goodness of Fit Tests 9 / 38
Marc Mehlman
Goodness of Fit Chi–Squared Test
If H0 was true, how likely would it be to find by chance a discrepancy between
Using a typical significance level of 5%, we conclude that the results are
currently equally represented in this ecosystem (P < 0.05). From Table E, we find 5.99 < X2 < 7.38, so 0.05 > P > 0.025 Software gives P-value = 0.049
Marc Mehlman (University of New Haven) Goodness of Fit Tests 10 / 38
Marc Mehlman
Goodness of Fit Chi–Squared Test
The individual values summed in the χ2 statistic are the χ 2 components.
When the test is statistically significant, the largest components
indicate which condition(s) are most different from the expected H0.
You can also compare the actual proportions qualitatively in a graph.
The largest X2 component, 4.0, is for species B. The increase in species B contributes the most to significance. ( ) ( ) ( ) 02 . 6 81 . . 4 21 . 1 100 100 91 100 100 120 100 100 89
2 2 2 2
= + + = − + − + − = χ
0% 10% 20% 30% 40% gumpies sticklebarbs spotheads Percent of total .
A B C
Marc Mehlman (University of New Haven) Goodness of Fit Tests 11 / 38
Marc Mehlman
Goodness of Fit Chi–Squared Test
Goodness of fit for a genetic model Under a genetic model of dominant epistasis, a cross of white and yellow summer squash will yield white, yellow, and green squash with probabilities 12/16, 3/16 and 1/16 respectively (expected ratios 12:3:1). Suppose we observe the following data: Are they consistent with the genetic model? H0: pwhite = 12/16; pyellow = 3/16; pgreen = 1/16 Ha: H0 is not true We use H0 to compute the expected counts for each squash type.
Marc Mehlman (University of New Haven) Goodness of Fit Tests 12 / 38
Marc Mehlman
Goodness of Fit Chi–Squared Test
We then compute the chi-square statistic: Degrees of freedom = k – 1 = 2, and X2 = 0.691. Using Table D we find P > 0.25. Software gives P = 0.708. This is not significant and we fail to reject H0. The observed data are consistent with a dominant epistatic genetic model (12:3:1). The small observed deviations from the model could simply have arisen from the random sampling process alone. ( ) ( ) ( ) 069106 . 8125 . 12 8125 . 12 10 4375 . 38 4375 . 38 40 75 . 153 75 . 153 155
2 2 2 2
= − + − + − = χ 69106 . 61738 . 06352 . 01016 .
2
χ
Marc Mehlman (University of New Haven) Goodness of Fit Tests 13 / 38
Marc Mehlman
Goodness of Fit Chi–Squared Test
Marc Mehlman (University of New Haven) Goodness of Fit Tests 14 / 38
Marc Mehlman
Tests of Independence
Marc Mehlman (University of New Haven) Goodness of Fit Tests 15 / 38
Marc Mehlman
Tests of Independence
Given two different finite partitions of the population, namely {Ai}r
i=1 and {Bj}c j=1. One wants
to test if the two partitions are independent: H0 : P(Ai ∩ Bj) = P(Ai)P(Bj) for every 1 ≤ i ≤ r and 1 ≤ j ≤ c versus HA : not H0. One takes a random sample, x1, · · · , xn, from the population. Let
def
= the number of xj’s that fall in Ai ∩ BJ and Cj
def
=
r
and Ri
def
=
c
The data for the test of independence is given in a r × c contingency table: B1 B2 · · · Bc Row Totals A1
· · ·
R1 A2
· · ·
R2 . . . . . . . . . . . . . . . . . . Ar
· · ·
Rr Column Totals C1 C2 · · · Cc Grand Total = n The name “contingency table” was given by Karl Pearson.
Marc Mehlman (University of New Haven) Goodness of Fit Tests 16 / 38
Marc Mehlman
Tests of Independence
400 1380 416 1823 188 1168
An experiment has a two-way, or block, design if two categorical factors are studied with several levels of each factor. Two-way tables organize data about two categorical variables with any number of levels/treatments obtained from a two-way, or block, design.
First factor: Parent smoking status Second factor: Student smoking status High school students were asked whether they smoke, and whether their parents smoke:
Marc Mehlman (University of New Haven) Goodness of Fit Tests 17 / 38
Marc Mehlman
Tests of Independence
student smokes student doesn’t smoke Total both parents smoke 400 1,380 1,780
416 1,823 2,239 neither parent smokes 188 1,168 1,356 Total 1,004 4,371 5,375 Assuming the observed corresponds to the population, ie using empirical probabilities in place of actual probabilities: P(student & one parent smokes) = P(being in row #2 & column #1) = 2, 1 entry grand total = 416 5, 375 = 0.077 P(student smokes) = P(being in column #1) = column #1 total grand total = 1, 004 5, 375 = 0.187 P(one parent smokes) = P(being in row #2) = row #2 total grand total = 2, 239 5, 375 = 0.417. Marc Mehlman (University of New Haven) Goodness of Fit Tests 18 / 38
Marc Mehlman
Tests of Independence
Marc Mehlman (University of New Haven) Goodness of Fit Tests 19 / 38
Marc Mehlman
Tests of Independence
Marc Mehlman (University of New Haven) Goodness of Fit Tests 20 / 38
Marc Mehlman
Tests of Independence
def
r
c
Marc Mehlman (University of New Haven) Goodness of Fit Tests 21 / 38
Marc Mehlman
Tests of Independence
Influence of parental smoking Here is a computer output for a chi-square test performed on the data from a random sample of high school students (rows are parental smoking habits, columns are the students’ smoking habits). What does it tell you? Sample size? Hypotheses? Are the data ok for a χ2 test? Interpretation?
Marc Mehlman (University of New Haven) Goodness of Fit Tests 22 / 38
Marc Mehlman
Tests of Independence
Example (cont.) > row1=c(400,1380) > row2=c(416,1823) > row3=c(188,1168) > obs = rbind(row1,row2,row3) > chisq.test(obs) Pearson’s Chi-squared test data:
X-squared = 37.5663, df = 2, p-value = 6.959e-09 > exp=chisq.test(obs)$expected > exp [,1] [,2] row1 332.4874 1447.513 row2 418.2244 1820.776 row3 253.2882 1102.712 > (obs-exp)^2/exp [,1] [,2] row1 13.70862455 3.14881241 row2 0.01183057 0.00271743 row3 16.82884348 3.86551335
Marc Mehlman (University of New Haven) Goodness of Fit Tests 23 / 38
Marc Mehlman
Tests of Independence
1 z test for comparing two proportions. 2 Goodness of Fit Chi–Squared Test for Independence.
Marc Mehlman (University of New Haven) Goodness of Fit Tests 24 / 38
Marc Mehlman
Test of Homogeneity
Marc Mehlman (University of New Haven) Goodness of Fit Tests 25 / 38
Marc Mehlman
Test of Homogeneity
Definition A test of homogeneity tests if two different populations have the same proportion of some trait, i.e., the corresponding 2 × 2 contingency table has independent row and column variables. Example Computer chips are manufactured at two different fab plants. Let n def = # computer chips j def = # defective m def = # from fab plant A X def = # defects from fab plant A Question: Does one of the fab plants have a greater chance of creating defects than the other? Consider Fab Plant A Fab Plant B Totals Defective X j − X j Nondefective m − X n − m − j + X n − j Totals m n − m n Notice that with n, m and j fixed, the inner four entries are determined solely by X. Marc Mehlman (University of New Haven) Goodness of Fit Tests 26 / 38
Marc Mehlman
Test of Homogeneity
Theorem (Fisher’s Exact Test) Assume j of n objects are of Type A, the rest are of Type B. Given m of the n objects, one has the hypotheses, H0 : the m objects were chosen independent of type from the n objects, versus H1 : not H0. Test Statistic: X = # of of Type A objects in the set of m objects. ∼ HYP(n, j, m) under H0. Reject H0 when X takes on extreme values in either tail. The model for X ∼ HYP(n, j, m), the hypergeometric distribution is X = # of defective items in a sample of m items chosen from an n items of which j are defective. Note: avoids using chi–squared test for 2 by 2 case with small samples. One uses computer programs to calculate p–values.
Marc Mehlman (University of New Haven) Goodness of Fit Tests 27 / 38
Marc Mehlman
Test of Homogeneity
Example A C. difficile experiment involved 29 patients with inflamed colons. Sixteen where given fecal–implants (to introduce beneficial bacteria to the colon) and 13 were were treated with the antibiotic, vancomycin. There were 3 sick and 13 cured fecal–transplant patients, and 9 sick and 4 cured vancomycin patients. fecal vancomycin sick 3 9 cured 13 4 Find the p–value of H0 : fecal/vancomycin is independent of sick/cured. Solution: Using R: > fisher.test(rbind(c(3,9),c(13,4))) Fisher’s Exact Test for Count Data data: rbind(c(3, 9), c(13, 4)) p-value = 0.00953 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.0126885 0.7278730 sample estimates:
0.1130106 Marc Mehlman (University of New Haven) Goodness of Fit Tests 28 / 38
Marc Mehlman
Test of Homogeneity
Marc Mehlman (University of New Haven) Goodness of Fit Tests 29 / 38
Marc Mehlman
Test of Homogeneity McNemar Test (Matched Pairs)
n) is similar to the ratio of after ”yes” votes to votes cast ( b n) the χ2
Marc Mehlman (University of New Haven) Goodness of Fit Tests 30 / 38
Marc Mehlman
Test of Homogeneity McNemar Test (Matched Pairs)
0 : a = b.
Marc Mehlman (University of New Haven) Goodness of Fit Tests 31 / 38
Marc Mehlman
Test of Homogeneity McNemar Test (Matched Pairs)
0 is that the contingency table above be symmetric, not that before/after
0 : ˜
0 would
a n = b n so χ2 test for independence says the data is consistent with independence
Marc Mehlman (University of New Haven) Goodness of Fit Tests 32 / 38
Marc Mehlman
Test of Homogeneity McNemar Test (Matched Pairs)
Theorem (McNemar’s Test (Quinn McNemar, psychologist (1947))) Let (x1, y1), · · · , (xn, yn) be a paired random sample where X ∼ BIN(1, pX ) and Y ∼ BIN(1, pY ). Define b def =
n
xj = # of xj’s that equal 1 and c def =
n
yj = # of yj’s that equal 1. For an approximate test H0 : frequencies of b and c occur in same proportion assume b + c ≥ 10 and use the test statistic ˜ c2 = (|b − c| − 1)2 b + c which is χ2(1) under H0. One uses a right tail test. It is entirely possible for Fisher’s Exact Test for independence results in an insignificant result, while McNemar’s Test returns a significant result. McNemar’s Test tests for symmetry about the diagonal in the contingency table, not independence.
Marc Mehlman (University of New Haven) Goodness of Fit Tests 33 / 38
Marc Mehlman
Test of Homogeneity McNemar Test (Matched Pairs)
Example Suppose the softness or callousness of hands was tallied in the following table from randomly selected men. Right Hand Soft Callused Left Hand Soft 14 63 Callused 58 273 If a person is to have one soft and one calloused hand, is it equally likely that the callused hand be the right or left hand? Use Nemar’s Test to get a p–value. Solution: Here n = 14 + 63 + 58 + 273 = 408. Using McNemar’s Test, c2 = (|63−58|−1)2
63+58
=
16 121 = 0.1322. Since this is sampled from χ2(1), one has a p–value of
0.7161 and the test is insignificant. One can not reject the hypothesis that it is equally likely that if one has one callused hand and one soft hand, it is equally likely that the callused hand is your left hand instead of right hand. Notice, one can reorganize the data, losing the information of which left hand goes with which right hand, and obtain Soft Callused Right Hands 72 336 Left Hands 77 331 . Fisher’s Exact test produces an p–value of 0.7171. One can not reject the hypothesis that handiness and callousness is independent.
Marc Mehlman (University of New Haven) Goodness of Fit Tests 34 / 38
Marc Mehlman
Test of Homogeneity McNemar Test (Matched Pairs) Example Notice that a chi–square indep test instead of the Fisher’s Exact Test yields a p–value of 0.6505. The difference is because Fisher’s Exact Test is exact, while the chi-squared indep test is approximate. > mcnemar.test(matrix(c(14,63, 58,273),nrow=2)) McNemar’s Chi-squared test with continuity correction data: matrix(c(14, 63, 58, 273), nrow = 2) McNemar’s chi-squared = 0.1322, df = 1, p-value = 0.7161 > chisq.test(matrix(c(72,336,77,331),nrow=2),correct=FALSE) # no continuity correction Pearson’s Chi-squared test data: matrix(c(72, 336, 77, 331), nrow = 2) X-squared = 0.2053, df = 1, p-value = 0.6505 > fisher.test(matrix(c(72,336,77,331),nrow=2)) Fisher’s Exact Test for Count Data data: matrix(c(72, 336, 77, 331), nrow = 2) p-value = 0.7171 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.6350921 1.3351891 sample estimates:
0.9212498 Marc Mehlman (University of New Haven) Goodness of Fit Tests 35 / 38
Marc Mehlman
Chapter #9 R Assignment
Marc Mehlman (University of New Haven) Goodness of Fit Tests 36 / 38
Marc Mehlman
Chapter #9 R Assignment 1
2
Marc Mehlman (University of New Haven) Goodness of Fit Tests 37 / 38
Marc Mehlman
Chapter #9 R Assignment
3 A particular gene sites in the common housefly is either deemed
Marc Mehlman (University of New Haven) Goodness of Fit Tests 38 / 38