Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com - - PowerPoint PPT Presentation

goodness of fit tests
SMART_READER_LITE
LIVE PREVIEW

Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com - - PowerPoint PPT Presentation

Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc Mehlman Marc Mehlman (University of New Haven) Goodness of Fit Tests 1 / 38 Table of Contents Goodness of Fit ChiSquared Test 1 Tests of


slide-1
SLIDE 1

Marc Mehlman

Goodness of Fit Tests

Marc H. Mehlman

marcmehlman@yahoo.com

University of New Haven

Marc Mehlman (University of New Haven) Goodness of Fit Tests 1 / 38

slide-2
SLIDE 2

Marc Mehlman

Table of Contents

1

Goodness of Fit Chi–Squared Test

2

Tests of Independence

3

Test of Homogeneity McNemar Test (Matched Pairs)

4

Chapter #9 R Assignment

Marc Mehlman (University of New Haven) Goodness of Fit Tests 2 / 38

slide-3
SLIDE 3

Marc Mehlman

Goodness of Fit Chi–Squared Test

Goodness of Fit Chi–Squared Test

Goodness of Fit Chi–Squared Test

Marc Mehlman (University of New Haven) Goodness of Fit Tests 3 / 38

slide-4
SLIDE 4

Marc Mehlman

Goodness of Fit Chi–Squared Test

Idea of the chi-square test

The chi-square (χ2) test is used when the data are categorical. It measures how different the observed data are from what we would expect if H0 was true.

0% 5% 10% 15% 20% Mon. Tue. Wed. Thu. Fri. Sat. Sun. Sample composition 0% 5% 10% 15% 20% Mon. Tue. Wed. Thu. Fri. Sat. Sun. Expected composition

Observed sample proportions (1 SRS of 700 births) Expected proportions under H0: p1=p2=p3=p4=p5=p6=p7=1/7 Marc Mehlman (University of New Haven) Goodness of Fit Tests 4 / 38

slide-5
SLIDE 5

Marc Mehlman

Goodness of Fit Chi–Squared Test

Published tables & software give the upper-tail area for critical values of many χ2 distributions.

The χ2 distributions are a family of distributions that take only positive values, are skewed to the right, and are described by a specific degrees of freedom.

The chi-square distributions

Marc Mehlman (University of New Haven) Goodness of Fit Tests 5 / 38

slide-6
SLIDE 6

Marc Mehlman

Goodness of Fit Chi–Squared Test

p df 0.25 0.2 0.15 0.1 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005 1 1.32 1.64 2.07 2.71 3.84 5.02 5.41 6.63 7.88 9.14 10.83 12.12 2 2.77 3.22 3.79 4.61 5.99 7.38 7.82 9.21 10.60 11.98 13.82 15.20 3 4.11 4.64 5.32 6.25 7.81 9.35 9.84 11.34 12.84 14.32 16.27 17.73 4 5.39 5.99 6.74 7.78 9.49 11.14 11.67 13.28 14.86 16.42 18.47 20.00 5 6.63 7.29 8.12 9.24 11.07 12.83 13.39 15.09 16.75 18.39 20.51 22.11 6 7.84 8.56 9.45 10.64 12.59 14.45 15.03 16.81 18.55 20.25 22.46 24.10 7 9.04 9.80 10.75 12.02 14.07 16.01 16.62 18.48 20.28 22.04 24.32 26.02 8 10.22 11.03 12.03 13.36 15.51 17.53 18.17 20.09 21.95 23.77 26.12 27.87 9 11.39 12.24 13.29 14.68 16.92 19.02 19.68 21.67 23.59 25.46 27.88 29.67 10 12.55 13.44 14.53 15.99 18.31 20.48 21.16 23.21 25.19 27.11 29.59 31.42 11 13.70 14.63 15.77 17.28 19.68 21.92 22.62 24.72 26.76 28.73 31.26 33.14 12 14.85 15.81 16.99 18.55 21.03 23.34 24.05 26.22 28.30 30.32 32.91 34.82 13 15.98 16.98 18.20 19.81 22.36 24.74 25.47 27.69 29.82 31.88 34.53 36.48 14 17.12 18.15 19.41 21.06 23.68 26.12 26.87 29.14 31.32 33.43 36.12 38.11 15 18.25 19.31 20.60 22.31 25.00 27.49 28.26 30.58 32.80 34.95 37.70 39.72 16 19.37 20.47 21.79 23.54 26.30 28.85 29.63 32.00 34.27 36.46 39.25 41.31 17 20.49 21.61 22.98 24.77 27.59 30.19 31.00 33.41 35.72 37.95 40.79 42.88 18 21.60 22.76 24.16 25.99 28.87 31.53 32.35 34.81 37.16 39.42 42.31 44.43 19 22.72 23.90 25.33 27.20 30.14 32.85 33.69 36.19 38.58 40.88 43.82 45.97 20 23.83 25.04 26.50 28.41 31.41 34.17 35.02 37.57 40.00 42.34 45.31 47.50 21 24.93 26.17 27.66 29.62 32.67 35.48 36.34 38.93 41.40 43.78 46.80 49.01 22 26.04 27.30 28.82 30.81 33.92 36.78 37.66 40.29 42.80 45.20 48.27 50.51 23 27.14 28.43 29.98 32.01 35.17 38.08 38.97 41.64 44.18 46.62 49.73 52.00 24 28.24 29.55 31.13 33.20 36.42 39.36 40.27 42.98 45.56 48.03 51.18 53.48 25 29.34 30.68 32.28 34.38 37.65 40.65 41.57 44.31 46.93 49.44 52.62 54.95 26 30.43 31.79 33.43 35.56 38.89 41.92 42.86 45.64 48.29 50.83 54.05 56.41 27 31.53 32.91 34.57 36.74 40.11 43.19 44.14 46.96 49.64 52.22 55.48 57.86 28 32.62 34.03 35.71 37.92 41.34 44.46 45.42 48.28 50.99 53.59 56.89 59.30 29 33.71 35.14 36.85 39.09 42.56 45.72 46.69 49.59 52.34 54.97 58.30 60.73 30 34.80 36.25 37.99 40.26 43.77 46.98 47.96 50.89 53.67 56.33 59.70 62.16 40 45.62 47.27 49.24 51.81 55.76 59.34 60.44 63.69 66.77 69.70 73.40 76.09 50 56.33 58.16 60.35 63.17 67.50 71.42 72.61 76.15 79.49 82.66 86.66 89.56 60 66.98 68.97 71.34 74.40 79.08 83.30 84.58 88.38 91.95 95.34 99.61 102.70 80 88.13 90.41 93.11 96.58 101.90 106.60 108.10 112.30 116.30 120.10 124.80 128.30 100 109.10 111.70 114.70 118.50 124.30 129.60 131.10 135.80 140.20 144.30 149.40 153.20

Table D

Ex: df = 6

If χ2 = 15.9 the P-value is between 0.01 −0.02.

Marc Mehlman (University of New Haven) Goodness of Fit Tests 6 / 38

slide-7
SLIDE 7

Marc Mehlman

Goodness of Fit Chi–Squared Test

Data for n observations of a categorical variable with k possible outcomes are summarized as observed counts, n1, n2, · · · , nk in k cells. Let H0 specify the cell probabilities p1, p2, · · · , pk for the k possible outcomes. Definition

  • j

def

= observed in cell j ej

def

= npj = expected in cell j Example Three species of large fish (A, B, C) that are native to a certain river have been

  • bserved to exist in equal proportions.

A recent survey of 300 large fish found 89 of species A, 120 of species B and 91 of species C. What are the observed and expected counts? Solution:

  • 1 = 89,
  • 2 = 120

and

  • 3 = 91.

e1 = e2 = e3 = npj = 300 1 3

  • = 100.

Marc Mehlman (University of New Haven) Goodness of Fit Tests 7 / 38

slide-8
SLIDE 8

Marc Mehlman

Goodness of Fit Chi–Squared Test

Chi–Squared Goodness of Fit Test

Theorem (Chi–Squared Goodness of Fit Test) The chi–square statistic, which measures how much the observed cell counts differ from the expected cell counts, is x

def

=

k

  • j=1

(oj − ej)2 ej . Let H0 : the cell probabilities are p1, · · · , pk. If H0 is true and all expected counts are ≥ 1 no more than 20% of the expected counts are < 5. then the chi–squared statistic is approximately χ2(k − 1). In that case, the p–value of the test H0 versus HA : not H0 is approximately P(x ≥ C) where C ∼ χ2(k − 1).

Marc Mehlman (University of New Haven) Goodness of Fit Tests 8 / 38

slide-9
SLIDE 9

Marc Mehlman

Goodness of Fit Chi–Squared Test

Example

River ecology Three species of large fish (A, B, C) that are native to a certain river have been

  • bserved to co-exist in equal proportions.

A recent random sample of 300 large fish found 89 of species A, 120 of species B, and 91 of species C. Do the data provide evidence that the river’s ecosystem has been upset? H0: pA = pB = pC = 1/3 Ha: H0 is not true Number of proportions compared: k = 3 All the expected counts are : n / k = 300 / 3 = 100 Degrees of freedom: (k – 1) = 3 – 1 = 2 ( ) ( ) ( ) 02 . 6 81 . . 4 21 . 1 100 100 91 100 100 120 100 100 89

2 2 2 2

= + + = − + − + − = χ X2 calculations:

Marc Mehlman (University of New Haven) Goodness of Fit Tests 9 / 38

slide-10
SLIDE 10

Marc Mehlman

Goodness of Fit Chi–Squared Test

Example (cont.)

If H0 was true, how likely would it be to find by chance a discrepancy between

  • bserved and expected frequencies yielding a X2 value of 6.02 or greater?

Using a typical significance level of 5%, we conclude that the results are

  • significant. We have found evidence that the 3 fish populations are not

currently equally represented in this ecosystem (P < 0.05). From Table E, we find 5.99 < X2 < 7.38, so 0.05 > P > 0.025 Software gives P-value = 0.049

Marc Mehlman (University of New Haven) Goodness of Fit Tests 10 / 38

slide-11
SLIDE 11

Marc Mehlman

Goodness of Fit Chi–Squared Test

Example (cont.) Interpreting the χ2 output

The individual values summed in the χ2 statistic are the χ 2 components.

 When the test is statistically significant, the largest components

indicate which condition(s) are most different from the expected H0.

 You can also compare the actual proportions qualitatively in a graph.

The largest X2 component, 4.0, is for species B. The increase in species B contributes the most to significance. ( ) ( ) ( ) 02 . 6 81 . . 4 21 . 1 100 100 91 100 100 120 100 100 89

2 2 2 2

= + + = − + − + − = χ

0% 10% 20% 30% 40% gumpies sticklebarbs spotheads Percent of total .

A B C

Marc Mehlman (University of New Haven) Goodness of Fit Tests 11 / 38

slide-12
SLIDE 12

Marc Mehlman

Goodness of Fit Chi–Squared Test

Example

Goodness of fit for a genetic model Under a genetic model of dominant epistasis, a cross of white and yellow summer squash will yield white, yellow, and green squash with probabilities 12/16, 3/16 and 1/16 respectively (expected ratios 12:3:1). Suppose we observe the following data: Are they consistent with the genetic model? H0: pwhite = 12/16; pyellow = 3/16; pgreen = 1/16 Ha: H0 is not true We use H0 to compute the expected counts for each squash type.

Marc Mehlman (University of New Haven) Goodness of Fit Tests 12 / 38

slide-13
SLIDE 13

Marc Mehlman

Goodness of Fit Chi–Squared Test

Example (cont.)

We then compute the chi-square statistic: Degrees of freedom = k – 1 = 2, and X2 = 0.691. Using Table D we find P > 0.25. Software gives P = 0.708. This is not significant and we fail to reject H0. The observed data are consistent with a dominant epistatic genetic model (12:3:1). The small observed deviations from the model could simply have arisen from the random sampling process alone. ( ) ( ) ( ) 069106 . 8125 . 12 8125 . 12 10 4375 . 38 4375 . 38 40 75 . 153 75 . 153 155

2 2 2 2

= − + − + − = χ 69106 . 61738 . 06352 . 01016 .

2

χ

Marc Mehlman (University of New Haven) Goodness of Fit Tests 13 / 38

slide-14
SLIDE 14

Marc Mehlman

Goodness of Fit Chi–Squared Test

Example (cont.) > obs=c(155,40,10) > tprob=c(12/16, 3/16, 1/16) > chisq.test(obs,p=tprob) Chi-squared test for given probabilities data:

  • bs

X-squared = 0.6911, df = 2, p-value = 0.7078 > exp=chisq.test(obs,p=tprob)$expected > exp [1] 153.7500 38.4375 12.8125 > (obs-exp)^2/exp [1] 0.01016260 0.06351626 0.61737805

Marc Mehlman (University of New Haven) Goodness of Fit Tests 14 / 38

slide-15
SLIDE 15

Marc Mehlman

Tests of Independence

Tests of Independence

Tests of Independence

Marc Mehlman (University of New Haven) Goodness of Fit Tests 15 / 38

slide-16
SLIDE 16

Marc Mehlman

Tests of Independence

r × c Contingency Tables

Given two different finite partitions of the population, namely {Ai}r

i=1 and {Bj}c j=1. One wants

to test if the two partitions are independent: H0 : P(Ai ∩ Bj) = P(Ai)P(Bj) for every 1 ≤ i ≤ r and 1 ≤ j ≤ c versus HA : not H0. One takes a random sample, x1, · · · , xn, from the population. Let

  • ij

def

= the number of xj’s that fall in Ai ∩ BJ and Cj

def

=

r

  • i=1
  • ij

and Ri

def

=

c

  • j=1
  • ij.

The data for the test of independence is given in a r × c contingency table: B1 B2 · · · Bc Row Totals A1

  • 11
  • 12

· · ·

  • 1C

R1 A2

  • 21
  • 22

· · ·

  • 2C

R2 . . . . . . . . . . . . . . . . . . Ar

  • R1
  • R2

· · ·

  • RC

Rr Column Totals C1 C2 · · · Cc Grand Total = n The name “contingency table” was given by Karl Pearson.

Marc Mehlman (University of New Haven) Goodness of Fit Tests 16 / 38

slide-17
SLIDE 17

Marc Mehlman

Tests of Independence

Example

400 1380 416 1823 188 1168

An experiment has a two-way, or block, design if two categorical factors are studied with several levels of each factor. Two-way tables organize data about two categorical variables with any number of levels/treatments obtained from a two-way, or block, design.

Two-way tables

First factor: Parent smoking status Second factor: Student smoking status High school students were asked whether they smoke, and whether their parents smoke:

Marc Mehlman (University of New Haven) Goodness of Fit Tests 17 / 38

slide-18
SLIDE 18

Marc Mehlman

Tests of Independence

Example (cont.)

student smokes student doesn’t smoke Total both parents smoke 400 1,380 1,780

  • ne parent smokes

416 1,823 2,239 neither parent smokes 188 1,168 1,356 Total 1,004 4,371 5,375 Assuming the observed corresponds to the population, ie using empirical probabilities in place of actual probabilities: P(student & one parent smokes) = P(being in row #2 & column #1) = 2, 1 entry grand total = 416 5, 375 = 0.077 P(student smokes) = P(being in column #1) = column #1 total grand total = 1, 004 5, 375 = 0.187 P(one parent smokes) = P(being in row #2) = row #2 total grand total = 2, 239 5, 375 = 0.417. Marc Mehlman (University of New Haven) Goodness of Fit Tests 18 / 38

slide-19
SLIDE 19

Marc Mehlman

Tests of Independence

Expected Counts for r × c Contingency Tables

Observe: Assuming H0 : row variable and column variable are independent, eij = (grand total) ∗ P(being in ijth cell) = (grand total) ∗ P(being in row #i) ∗ P(being in column #j) = (grand total) ∗ row #i total grand total

column #j total grand total

  • = (row #i total) ∗ (column #j total)

grand total .

Marc Mehlman (University of New Haven) Goodness of Fit Tests 19 / 38

slide-20
SLIDE 20

Marc Mehlman

Tests of Independence

Example (cont.)

student smokes student doesn’t smoke Total both parents smoke 400 1,380 1,780

  • ne parent smokes

416 1,823 2,239 neither parent smokes 188 1,168 1,356 Total 1,004 4,371 5,375 The expected counts of the six cells are: e11 = 1, 780 ∗ 1, 004 5, 375 = 332.49 e12 = 1, 780 ∗ 4, 371 5, 375 = 1, 447.51 e21 = 2, 239 ∗ 1, 004 5, 375 = 418.22 e22 = 2, 239 ∗ 4, 371 5, 375 = 1, 820.48 e31 = 1, 356 ∗ 1, 004 5, 375 = 253.29 e32 = 1, 356 ∗ 4, 371 5, 375 = 1, 102.71

Marc Mehlman (University of New Haven) Goodness of Fit Tests 20 / 38

slide-21
SLIDE 21

Marc Mehlman

Tests of Independence

Chi–Squared Test for Two–Way Tables

Theorem (Chi–Squared Test for Two–Way Tables) The chi–square statistic from a two–way r × c table, x

def

=

r

  • i=1

c

  • j=1

(oij − eij)2 eij , measures how much the observed cell counts differ from the expected cell counts when H0: row variable and column variable are independent

  • holds. If H0 is true and

all expected counts are ≥ 1 no more than 20% of the expected counts are < 5. then the chi–squared statistic is approximately χ2((r − 1)(c − 1)). In that case, the p–value of the test, H0 versus HA : not H0 is approximately P(x ≥ C) where C ∼ χ2((r − 1)(c − 1)).

Marc Mehlman (University of New Haven) Goodness of Fit Tests 21 / 38

slide-22
SLIDE 22

Marc Mehlman

Tests of Independence

Example (cont.)

Influence of parental smoking Here is a computer output for a chi-square test performed on the data from a random sample of high school students (rows are parental smoking habits, columns are the students’ smoking habits). What does it tell you? Sample size? Hypotheses? Are the data ok for a χ2 test? Interpretation?

Marc Mehlman (University of New Haven) Goodness of Fit Tests 22 / 38

slide-23
SLIDE 23

Marc Mehlman

Tests of Independence

Example (cont.) > row1=c(400,1380) > row2=c(416,1823) > row3=c(188,1168) > obs = rbind(row1,row2,row3) > chisq.test(obs) Pearson’s Chi-squared test data:

  • bs

X-squared = 37.5663, df = 2, p-value = 6.959e-09 > exp=chisq.test(obs)$expected > exp [,1] [,2] row1 332.4874 1447.513 row2 418.2244 1820.776 row3 253.2882 1102.712 > (obs-exp)^2/exp [,1] [,2] row1 13.70862455 3.14881241 row2 0.01183057 0.00271743 row3 16.82884348 3.86551335

Marc Mehlman (University of New Haven) Goodness of Fit Tests 23 / 38

slide-24
SLIDE 24

Marc Mehlman

Tests of Independence

Equivalence of Tests

Consider a 2 × 2 two–way table: bad driver good driver male 789 563 female 823 575 One can test whether being a bad/good driver has nothing to do with gender by

1 z test for comparing two proportions. 2 Goodness of Fit Chi–Squared Test for Independence.

Both ways are equivalent and will yield the same result.

Marc Mehlman (University of New Haven) Goodness of Fit Tests 24 / 38

slide-25
SLIDE 25

Marc Mehlman

Test of Homogeneity

Test of Homogeneity

Test of Homogeneity

Marc Mehlman (University of New Haven) Goodness of Fit Tests 25 / 38

slide-26
SLIDE 26

Marc Mehlman

Test of Homogeneity

Test of Homogeneity (No Matched Pairs)

Definition A test of homogeneity tests if two different populations have the same proportion of some trait, i.e., the corresponding 2 × 2 contingency table has independent row and column variables. Example Computer chips are manufactured at two different fab plants. Let n def = # computer chips j def = # defective m def = # from fab plant A X def = # defects from fab plant A Question: Does one of the fab plants have a greater chance of creating defects than the other? Consider Fab Plant A Fab Plant B Totals Defective X j − X j Nondefective m − X n − m − j + X n − j Totals m n − m n Notice that with n, m and j fixed, the inner four entries are determined solely by X. Marc Mehlman (University of New Haven) Goodness of Fit Tests 26 / 38

slide-27
SLIDE 27

Marc Mehlman

Test of Homogeneity

Fisher’s Exact Test (No Matched Pairs)

Theorem (Fisher’s Exact Test) Assume j of n objects are of Type A, the rest are of Type B. Given m of the n objects, one has the hypotheses, H0 : the m objects were chosen independent of type from the n objects, versus H1 : not H0. Test Statistic: X = # of of Type A objects in the set of m objects. ∼ HYP(n, j, m) under H0. Reject H0 when X takes on extreme values in either tail. The model for X ∼ HYP(n, j, m), the hypergeometric distribution is X = # of defective items in a sample of m items chosen from an n items of which j are defective. Note: avoids using chi–squared test for 2 by 2 case with small samples. One uses computer programs to calculate p–values.

Marc Mehlman (University of New Haven) Goodness of Fit Tests 27 / 38

slide-28
SLIDE 28

Marc Mehlman

Test of Homogeneity

Example of Fisher’s Exact Test

Example A C. difficile experiment involved 29 patients with inflamed colons. Sixteen where given fecal–implants (to introduce beneficial bacteria to the colon) and 13 were were treated with the antibiotic, vancomycin. There were 3 sick and 13 cured fecal–transplant patients, and 9 sick and 4 cured vancomycin patients. fecal vancomycin sick 3 9 cured 13 4 Find the p–value of H0 : fecal/vancomycin is independent of sick/cured. Solution: Using R: > fisher.test(rbind(c(3,9),c(13,4))) Fisher’s Exact Test for Count Data data: rbind(c(3, 9), c(13, 4)) p-value = 0.00953 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.0126885 0.7278730 sample estimates:

  • dds ratio

0.1130106 Marc Mehlman (University of New Haven) Goodness of Fit Tests 28 / 38

slide-29
SLIDE 29

Marc Mehlman

Test of Homogeneity

Example of Fisher’s Exact Test (cont.)

Example (cont.) One can also use the hypergeometric distribution. As extreme as 3 or more extreme . > phyper(3,16,13,12) [1] 0.008401063 The reason this does not match the p–value R gave when using fisher.test is that the fisher.test was a two sided test and above only one extreme side was calculated. Since X ∼ HYP(29, 12, 16) is a discrete, non–symmetric distribution, it is not trivial to measure the probability of going just as extreme, but big instead of small. A typical way of doing this is to add together the probabilities of all combinations that have lower probabilities than that of the observed data.

Marc Mehlman (University of New Haven) Goodness of Fit Tests 29 / 38

slide-30
SLIDE 30

Marc Mehlman

Test of Homogeneity McNemar Test (Matched Pairs)

Contingency Tables: Two Viewpoints

Suppose n voters are asked if they would vote for a candidate before a debate and then, again, after the debate. The 2 × 2 contingency table of the 2n unpaired votes is Yes No Before a n − a n After b n − b n a + b 2n − a − b 2n To test for independence of vote totals: H0 : vote totals were not affected by debate versus H1 : vote totals were affected by the debate using a χ2 test with one degree of freedom. If the ratio of before “yes” votes to votes cast ( a

n) is similar to the ratio of after ”yes” votes to votes cast ( b n) the χ2

test will conclude the data is consistent with independence of before and after vote tallies.

Marc Mehlman (University of New Haven) Goodness of Fit Tests 30 / 38

slide-31
SLIDE 31

Marc Mehlman

Test of Homogeneity McNemar Test (Matched Pairs)

Contingency Tables: Two Viewpoints

A second way of thinking of the data is to consider the n paired votes of each of the n voters, (before yes/no, after yes/no). The before and after total vote tallies will remain as before (a and b will be considered fixed). After Yes No Before Yes x a − x a No b − x n + x − b − a n − a b n − b n Notice that given x, the above table is completely determined! Furthermore, the difference along the anti–diagonal will be b − a no matter what x is. Instead of testing H0, one tests H′

0 : a = b.

In other words, the number of “yes → no” voters equals the number of “no → yes” voters ⇔ the vote tallies for before and after are the same.

Marc Mehlman (University of New Haven) Goodness of Fit Tests 31 / 38

slide-32
SLIDE 32

Marc Mehlman

Test of Homogeneity McNemar Test (Matched Pairs)

Contingency Tables: Two Viewpoints

Hypothesis H′

0 is that the contingency table above be symmetric, not that before/after

and yes/no voting tallies be independent. Equivalently, After Yes No Before Yes ˜ p11 ˜ p12 ˜ p11 + ˜ p12 No ˜ p12 ˜ p22 ˜ p12 + ˜ p22 ˜ p11 + ˜ p12 ˜ p12 + ˜ p22 1 and H′

0 : ˜

p12 = ˜ p21. Independence of the yes/no–voting–tally variable and the before/after variable is different than independence of the before and after votes of each voter. For instance, if every voter voted the same before and after the debate, then both H0 and H′

0 would

hold, yet

a n = b n so χ2 test for independence says the data is consistent with independence

  • f before/after voting tallies, but

the before and after votes of a voter would be as dependent as they possibly can be (one could predict the after debate vote of a voter knowing the voter’s before debate vote).

Marc Mehlman (University of New Haven) Goodness of Fit Tests 32 / 38

slide-33
SLIDE 33

Marc Mehlman

Test of Homogeneity McNemar Test (Matched Pairs)

McNemar Test (Matched Pairs)

Theorem (McNemar’s Test (Quinn McNemar, psychologist (1947))) Let (x1, y1), · · · , (xn, yn) be a paired random sample where X ∼ BIN(1, pX ) and Y ∼ BIN(1, pY ). Define b def =

n

  • j=1

xj = # of xj’s that equal 1 and c def =

n

  • j=1

yj = # of yj’s that equal 1. For an approximate test H0 : frequencies of b and c occur in same proportion assume b + c ≥ 10 and use the test statistic ˜ c2 = (|b − c| − 1)2 b + c which is χ2(1) under H0. One uses a right tail test. It is entirely possible for Fisher’s Exact Test for independence results in an insignificant result, while McNemar’s Test returns a significant result. McNemar’s Test tests for symmetry about the diagonal in the contingency table, not independence.

Marc Mehlman (University of New Haven) Goodness of Fit Tests 33 / 38

slide-34
SLIDE 34

Marc Mehlman

Test of Homogeneity McNemar Test (Matched Pairs)

Example Suppose the softness or callousness of hands was tallied in the following table from randomly selected men. Right Hand Soft Callused Left Hand Soft 14 63 Callused 58 273 If a person is to have one soft and one calloused hand, is it equally likely that the callused hand be the right or left hand? Use Nemar’s Test to get a p–value. Solution: Here n = 14 + 63 + 58 + 273 = 408. Using McNemar’s Test, c2 = (|63−58|−1)2

63+58

=

16 121 = 0.1322. Since this is sampled from χ2(1), one has a p–value of

0.7161 and the test is insignificant. One can not reject the hypothesis that it is equally likely that if one has one callused hand and one soft hand, it is equally likely that the callused hand is your left hand instead of right hand. Notice, one can reorganize the data, losing the information of which left hand goes with which right hand, and obtain Soft Callused Right Hands 72 336 Left Hands 77 331 . Fisher’s Exact test produces an p–value of 0.7171. One can not reject the hypothesis that handiness and callousness is independent.

Marc Mehlman (University of New Haven) Goodness of Fit Tests 34 / 38

slide-35
SLIDE 35

Marc Mehlman

Test of Homogeneity McNemar Test (Matched Pairs) Example Notice that a chi–square indep test instead of the Fisher’s Exact Test yields a p–value of 0.6505. The difference is because Fisher’s Exact Test is exact, while the chi-squared indep test is approximate. > mcnemar.test(matrix(c(14,63, 58,273),nrow=2)) McNemar’s Chi-squared test with continuity correction data: matrix(c(14, 63, 58, 273), nrow = 2) McNemar’s chi-squared = 0.1322, df = 1, p-value = 0.7161 > chisq.test(matrix(c(72,336,77,331),nrow=2),correct=FALSE) # no continuity correction Pearson’s Chi-squared test data: matrix(c(72, 336, 77, 331), nrow = 2) X-squared = 0.2053, df = 1, p-value = 0.6505 > fisher.test(matrix(c(72,336,77,331),nrow=2)) Fisher’s Exact Test for Count Data data: matrix(c(72, 336, 77, 331), nrow = 2) p-value = 0.7171 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.6350921 1.3351891 sample estimates:

  • dds ratio

0.9212498 Marc Mehlman (University of New Haven) Goodness of Fit Tests 35 / 38

slide-36
SLIDE 36

Marc Mehlman

Chapter #9 R Assignment

Chapter #9 R Assignment

Chapter #10 R Assignment

Marc Mehlman (University of New Haven) Goodness of Fit Tests 36 / 38

slide-37
SLIDE 37

Marc Mehlman

Chapter #9 R Assignment 1

A car expert claims that 30% of all cars in Johnstown are American made, 35% are Japanese made, 20% are Korean made and 15% are European. Of 156 cars randomly observed in Johnstown, 67 were American, 42 were Japanese, 24 were Korean and 23 were European. Find the p–value of a goodness of fit test between the what was expected and what was observed.

2

Senie et al. (1981) investigated the relationship between age and frequency of breast self-examination in a sample of women (Senie, R. T., Rosen, P. P., Lesser,

  • M. L., and Kinne, D. W. Breast self–examinations and medical examination

relating to breast cancer stage. American Journal of Public Health, 71, 583–590.) A summary of the results is presented in the following table: Frequency of breast self–examination Age Monthly Occasionally Never under 45 91 90 51 45 - 59 150 200 155 60 and over 109 198 172 From Hand et al., page 307, table 368. Do an independence test to see if age and frequency of breast self–examination are independent.

Marc Mehlman (University of New Haven) Goodness of Fit Tests 37 / 38

slide-38
SLIDE 38

Marc Mehlman

Chapter #9 R Assignment

Assignment

3 A particular gene sites in the common housefly is either deemed

“synonymous” if they did not affect amino acids or were deemed “replacement” if they did. These sites were also deemed “polymorphisms” if varied among subspecies or were deemed “fixed” if they did not. The following data was collected: Synonymous Replacement polymorphisms 43 2 fixed 17 7 Find the p–value of H0 synonymous/replacement is independent of polymorphisms/fixed.

Marc Mehlman (University of New Haven) Goodness of Fit Tests 38 / 38