Statistical Methods: Lecture 10 Dennis Dobler Vrije Universiteit - - PowerPoint PPT Presentation

statistical methods lecture 10
SMART_READER_LITE
LIVE PREVIEW

Statistical Methods: Lecture 10 Dennis Dobler Vrije Universiteit - - PowerPoint PPT Presentation

Goodness-of-fit Test of independence Test of homogeneity Fishers Exact Test Statistical Methods: Lecture 10 Dennis Dobler Vrije Universiteit Amsterdam December 6, 2017 Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods:


slide-1
SLIDE 1

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

Statistical Methods: Lecture 10

Dennis Dobler

Vrije Universiteit Amsterdam

December 6, 2017

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-2
SLIDE 2

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

Lecture Overview

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test The test of homogeneity and Fisher Exact Test are mentioned in the book in Section 10.3, but no procedure is given. In these slides these procedures are given and will be required for assignments and exam.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-3
SLIDE 3

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.2 Goodness-of-Fit

Recall frequency distribution: counts of data in different categories. Usually presented in a table form. Idea of goodness-of-fit: we would like to know whether an observed frequency distribution fits some claimed distribution.

Exercise 19 (10.2): M&M Candies

Mars, Inc. claims that the colours of M&M’s are distributed according to the following percentages: Colour Percentage Red 13% Orange 20% Yellow 14% Brown 13% Blue 24% Green 16% We would like to test whether the colour distribution is as claimed with significance level α = 5%. We collected a random sample of n = 100 M&M’s. The observed frequency distribution is as follows: Colour Frequency Red 13 Orange 25 Yellow 8 Brown 8 Blue 27 Green 19 How do we decide whether the

  • bserved frequencies do not match the

claimed distribution?

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-4
SLIDE 4

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.2 Goodness-of-Fit

Exercise 19 (10.2): M&M Candies

Recall, claimed distribution is as follows: Colour Red Orange Yellow Brown Blue Green Percentage 13% 20% 14% 13% 24% 16% Since n = 100, we expect 100 · 0.13 = 13 red coloured M&M’s in the sample if the colour distribution is as claimed. Similarly for the other colours, so we obtain the following expected frequencies: Colour Red Orange Yellow Brown Blue Green Expected frequency 13 20 14 13 24 16 NB: in general, n = 100 so this calculation is then slightly more complicated.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-5
SLIDE 5

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.2 Goodness-of-Fit

Expected frequencies

Suppose there are k different categories and a random sample of size n is conducted. Let p1 be the claimed probability that a subject falls in category 1. Similarly for probabilities p2, . . . , pk. H0: frequency counts agree with the claimed distribution; Ha: frequency counts do not agree with the claimed distribution. The expected frequency Ei is the expected number of occurences of category i in the sample under the assumption that H0 is true, it is computed by Ei = n · pi. Note: Ei do not have to be integers.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-6
SLIDE 6

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.2 Goodness-of-Fit

Exercise 19 (10.2): M&M Candies

If the distribution is as claimed by Mars, Inc. we would expect the following frequencies: Colour Red Orange Yellow Brown Blue Green Expected frequency 13 20 14 13 24 16 And recall that we observed the following frequencies: Colour Red Orange Yellow Brown Blue Green Observed frequency 13 25 8 8 27 19 Idea: take as test statistic the sum of squared and scaled differences (Observed frequency − Expected frequency)2 Expected frequency If certain requirements are met, this test statistic has under H0 approximately a known theoretical distribution.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-7
SLIDE 7

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.2 Goodness-of-Fit

Goodness-of-fit test

Suppose there are k different categories and a random sample of size n is conducted. H0: frequency counts agree with the claimed distribution p1 = value, . . . Ha: frequency counts do not agree with the claimed distribution. Let Oi be the observed frequency count of category i. Expected frequency Ei is computed by Ei = n · pi. Requirements: all Ei ≥ 5. If the requirements are met, then the test statistic X 2 =

k

  • i=1

(Oi − Ei)2 Ei has approximately a chi-square distribution with k − 1 degrees of freedom under H0. H0 is rejected for large values of the observed value χ2:

◮ Critical value method: reject H0 if χ2 > χ2

k−1,α, where χ2 k−1,α can be found in

Table 4 of Appendix.

◮ P-value method: if P(X 2 ≥ χ2) < α reject H0. Use R for this. Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-8
SLIDE 8

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.2 Goodness-of-Fit

Intermezzo: Chi-square distribution

A random variable having a chi-square distribution with k degrees of freedom is a continuous random variable, whose distribution is asymmetric. It only takes positive values and is right-skewed. Furthermore, different degrees of freedom yield different distributions.

10 20 30 40 0.00 0.05 0.10 0.15 0.20 0.25 density df=3 df=5 df=10 df=20

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-9
SLIDE 9

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.2 Goodness-of-Fit

Exercise 19 (10.2): M&M Candies

Recall that we expected and observed the following frequencies: Colour Red Orange Yellow Brown Blue Green Observed frequency 13 25 8 8 27 19 Expected frequency 13 20 14 13 24 16 Since all Ei are larger than 5, the requirements are met, so the test statistic X 2 = k

i=1 (Oi −Ei )2 Ei

has under H0 approximately a chi-square distribution with k − 1 = 5 degrees of freedom. The observed value of the test statistic is χ2 =

6

  • i=1

(oi − Ei)2 Ei = (13 − 13)2 13 + (25 − 20)2 20 + (8 − 14)2 14 + (8 − 13)2 13 + (27 − 24)2 24 + (19 − 16)2 16 ≈ 6.68 Critical value is χ2

k−1,α = χ2 5,0.05 = 11.07. Since χ2 = 6.68 < 11.07 = χ2 5,0.05 we fail

to reject H0. There is not sufficient evidence to reject the claim by Mars, Inc. about the colour distribution.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-10
SLIDE 10

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.2 Goodness-of-Fit

Degrees of freedom and critical region

Note that the degrees of freedom of the goodness-of-fit test is determined by the number of categories, not by the sample size! Furthermore, note that the alternative hypothesis of a goodness-of-fit test is undirected: Ha: frequency counts do not agree with the claimed distribution. But H0 is only rejected for large values of the observed value χ2 of the test statistic. So alternative hypothesis is undirected, but the test is right-tailed!

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-11
SLIDE 11

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Interested in inference about two categorical variables, in particular whether there is a relationship between the two variables. For instance, are gender and left-handedness independent? Suppose these two variables are measured in a sample. How to present results?

Contingency table

A contingency table (or two-way table) is a table consisting of frequency counts of categorical data corresponding to two different variables. Row variable: r categories. Column variable: c categories. So table consists of r × c cells/entries. Oij: cell (i, j) of given contingency table, which corresponds with number of subjects in i-th category of row variable and j-th category of column variable.

Exercise 12 (10.3): Lefties

Left-handed Right-handed Total Male 23 217 240 Female 65 455 520 Total 88 672 760

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-12
SLIDE 12

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

Exercise 12 (10.3): Lefties

We would like to test the claim that gender and left-handedness are independent: H0 : gender and left-handedness are indepedent; Ha : gender and left-handedness are depedent. We take α = 5%. Conducted a sample of size 760: Left-handed Right-handed Total Male 23 217 240 Female 65 455 520 Total 88 672 760 If H0 is true, which contingency table would we expect? Left-handed Right-handed Total Male ? ? 240 Female ? ? 520 Total 88 672 760

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-13
SLIDE 13

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

Exercise 12 (10.3): Lefties

Left-handed Right-handed Total Male ? ? 240 Female ? ? 520 Total 88 672 760 We expect 760 · P(male and left-handed) in cell (1, 1) of the expected frequencies table, where P(male and left-handed) is the probability that a randomly selected person from this sample is both male and left-handed. If H0 is true, then gender and left-handedness are independent, thus: P(male and left-handed) = P(male) · P(left-handed). We have P(male) = 240

760 and P(left-handed) = 88 760 . Hence,

P(male and left-handed) = P(male) · P(left-handed) = 240 760 · 88 760 . Therefore, the expected number of persons in this sample which are both male and left-handed if H0 is true is 760 · P(male and left-handed) = 760 · 240 760 · 88 760 = 240 · 88 760 .

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-14
SLIDE 14

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Expected frequencies table

If the row and column variables are independent, then the expected frequencies table is a r × c table, of which the entries are computed by Eij = (i-th row total) · (j-th column total) grand total .

Exercise 12 (10.3): Lefties

Calculation on previous slide gave E11 = 240·88

760

≈ 27.8. Similarly, E12 = 240 · 672 760 ≈ 212.2, E21 = 520 · 88 760 ≈ 60.2, E22 = 520 · 672 760 ≈ 459.8. Idea is now to compare values oij of observed frequencies Oij with expected frequencies.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-15
SLIDE 15

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Test statistic for test of independence

Row variable has r categories, column variable has c categories. H0: row and column variable are independent; Ha: row and column variable are dependent. Requirements:

◮ 2 × 2: all Eij ≥ 5. ◮ larger tables: all Eij ≥ 1 and 80% of Eij larger than 5.

If the requirements are met, then the test statistic X 2 =

  • cells

(O − E)2 E =

  • (i,j)

(Oij − Eij)2 Eij has under H0 approximately a chi-square distribution with (r − 1)(c − 1) degrees of freedom. The null hypothesis is rejected for large values of the observed value χ2: compare with critical value χ2

(r−1)(c−1),α (can be found in Table 4 of Appendix) or compute P-value

P(X 2 ≥ χ2) with R.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-16
SLIDE 16

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Exercise 12 (10.3): Lefties

Recall, test statistic is X 2 =

cells (O−E)2 E

=

(i,j) (Oij −Eij )2 Eij

. Since all Eij are larger than 5, the requirements are met. Since r = 2, c = 2, we have that X 2 has approximately a chi-square distribution with (r − 1)(c − 1) = 1 degree of freedom. The observed value χ2 of the test statistic is χ2 =

  • (i,j)

(oij − Eij)2 Eij = (23 − 27.8)2 27.8 + (217 − 212.2)2 212.2 + (65 − 60.2)2 60.2 + (455 − 459.8)2 459.8 ≈ 1.36. Once again r = 2, c = 2 and recall that α = 5% so critical value is χ2

1,0.05 = 3.84,

which can be found in Table 4 of Appendix. We see that χ2 = 1.36 < 3.84 = χ2

1,0.05,

so we fail to reject H0. There is not sufficient evidence to reject of the claim that gender and left-handedness are independent variables.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-17
SLIDE 17

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Test of independence: Recap

◮ row variable has r categories, column variable has c categories. ◮ H0: row and column variable are independent;

Ha: row and column variable are dependent; Significance level α.

◮ Requirements (Eij is expected frequency count in cell (i, j) under H0) ◮ 2 × 2: all Eij ≥ 5. ◮ larger tables: all Eij ≥ 1 and 80% of Eij larger than 5. ◮ If the requirements are met, the test statistic X 2 = (O−E)2

E

=

(i,j) (Oij −Eij )2 Eij

has under H0 approximately a chi-square distribution with (r − 1)(c − 1) degrees

  • f freedom.

◮ ◮ Critical value method: Reject H0 if observed value χ2 of test statistic is larger than

critical value χ2

(r−1)(c−1),α ◮ P-value method: reject H0 if P-value=P(X 2 ≥ χ2) < α.

Undirected alternative hypothesis, but right-tailed test!

Note that the alternative hypothesis is undirected, but the test is right-tailed: H0 is rejected for large values of χ2.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-18
SLIDE 18

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Chi-square test in R; compute P-value manually

In R we can compute a P-value with pchisq(...,df=...) “semi-manually” for the previous example like this:

  • 11=23; o12=217; o21=65; o22=455

left=matrix(c(o11,o12,o21,o22),nrow=2,byrow=T); left ## [,1] [,2] ## [1,] 23 217 ## [2,] 65 455 e11=88*240/760; e12=672*240/760; e21=88*520/760; e22=672*520/760 expfreq=matrix(c(e11,e12,e21,e22),nrow=2,byrow=T); expfreq ## [,1] [,2] ## [1,] 27.79 212.2 ## [2,] 60.21 459.8 chi=(o11-e11)^2/e11+(o12-e12)^2/e12+(o21-e21)^2/e21+(o22-e22)^2/e22; chi ## [1] 1.364 #btw, a shorter way to compute the observed value is sum((left-expfreq)^2/expfreq) pvalue=1-pchisq(chi,df=1); pvalue ## [1] 0.2428 Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-19
SLIDE 19

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Chi-square test in R: use chisq.test()

The previous calculation can also completely be done by R, using chisq.test():

  • 11=23; o12=217; o21=65; o22=455

left=matrix(c(o11,o12,o21,o22),nrow=2,byrow=T); left ## [,1] [,2] ## [1,] 23 217 ## [2,] 65 455 chisq.test(left) ## ## Pearson's Chi-squared test with Yates' continuity correction ## ## data: left ## X-squared = 1.1, df = 1, p-value = 0.3

Why the difference with previous slide? R uses by default a continuity correction (since chi-square distribution is continuous approximation of discrete distribution):

chisq.test(left,correct=F) ## ## Pearson's Chi-squared test ## ## data: left ## X-squared = 1.4, df = 1, p-value = 0.2 Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-20
SLIDE 20

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Chi-square test in R: use chisq.test()

The function chisq.test() can also be used to obtain the table with expected frequencies under H0:

  • 11=23; o12=217; o21=65; o22=455

left=matrix(c(o11,o12,o21,o22),nrow=2,byrow=T); left ## [,1] [,2] ## [1,] 23 217 ## [2,] 65 455 chisq.test(left)$exp ## [,1] [,2] ## [1,] 27.79 212.2 ## [2,] 60.21 459.8

R warns if requirements are not met:

## Warning in chisq.test(matrix(c(5, 4, 8, 3), nrow = 2)): Chi-squared approximation may be incorrect ## ## Pearson's Chi-squared test with Yates' continuity correction ## ## data: matrix(c(5, 4, 8, 3), nrow = 2) ## X-squared = 0.11, df = 1, p-value = 0.7 Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-21
SLIDE 21

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Part 2 Seen: testing independence of two variables in one population. Now: Do different populations have the same proportions of some characteristics? I.e. are the different populations homogeneous? Setup: take different samples from the different populations and count the occurences

  • f the different categories.

Although the motivation and hypotheses are different, a test of homogeneity is carried

  • ut according to the same procedure as a test for independence.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-22
SLIDE 22

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Exercise 12 (10.3): Lefties

Consider men and women as two different populations. Interested whether men and women have the same proportion of left-handedness: H0 : Men and women have the some proportion of left-handedness. Ha: Men and women do not have the some proportion of left-handedness. Left-handed Not left-handed Total Male 23 217 240 Female 65 455 520 Total 88 672 760 We consider these data as obtained from two samples: one of size 240 from the population of men, of size 520 from the population of women. What are expected frequencies in this case? Under H0: E11 = 240 · P(left-handedness). We have P(left-handedness) =

88 760 . So:

E11 = 240 · P(left-handedness) = 240 · 88 760 ≈ 27.8. Similarly, E12 ≈ 212.2, E21 ≈ 60.2 and E22 ≈ 459.8.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-23
SLIDE 23

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Exercise 12 (10.3): Lefties

So, in general we have again Eij = (i-th row total) · (j-th column total) grand total . Furthermore, also the test statistic X 2 = (O−E)2

E

=

(i,j) (Oij −Eij )2 Eij

has under H0 approximately a chi-square distribution with (r − 1)(c − 1) = 1 degree of freedom. NB: all Eij ≥ 5 so requirements are met. The observed value is χ2 = (23 − 27.8)2 27.8 + (217 − 212.2)2 212.2 + (65 − 60.2)2 60.2 + (455 − 459.8)2 459.8 ≈ 1.36 With α = 5% the critical value is χ2

1,0.05 = 3.84. Since

χ2 = 1.36 < 3.84 = χ2

1,0.05 = 3.84 we fail to reject H0.

There is not sufficient evidence to warrant rejection of the claim that men and women have the same proportion of left-handedness.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-24
SLIDE 24

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Test of homogeneity: Recap

◮ r different populations and c different categories of some categorical variable. ◮ H0: Different populations have the same proportions of some characteristics;

Ha: Different populations do not have the same proportions of some characteristics. Significance level α.

◮ Requirements (Eij is expected frequency count in cell (i, j) under H0) ◮ 2 × 2: all Eij ≥ 5. ◮ larger tables: all Eij ≥ 1 and 80% of Eij larger than 5. ◮ If the requirements are met, the test statistic

X 2 = (O − E)2 E =

  • (i,j)

(Oij − Eij)2 Eij has under H0 approximately a chi-square distribution with (r − 1)(c − 1) degrees

  • f freedom.

◮ ◮ Critical value method: Reject H0 if observed value χ2 of test statistic is larger than

critical value χ2

(r−1)(c−1),α ◮ P-value method: reject H0 if P-value=P(X 2 ≥ χ2) < α. Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-25
SLIDE 25

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Recall:

◮ The test statistic X 2 = (O − E)2/E has approximately a chi-square

distribution with (r − 1)(c − 1) degrees of freedom under H0. Requirements have to be met in order for this approximation to be reasonable.

◮ With chi-square tests the alternative hypothesis has to be undirected: Ha: the

row and column variable are dependent (or two populations have different proportions of one characteristic). NB: the test is right-tailed though, since the null hypothesis only gets rejected for large values of the test statistic. If requirements are not met or if we want to test a directed claim for 2 × 2 contingency table: use Fisher’s exact test. Idea: Marginals (row/column totals) are assumed to be fixed. Under H0, the random variable “frequency count in cell (1, 1)” then has a known theoretical distribution, with certain parameters. So for an observed value O11 we can compute the P-value, i.e. the probability how likely this value O11 is under H0. Use fisher.test() in R to obtain this P-value.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-26
SLIDE 26

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Exercise 12 (10.3): Lefties

Recall the contingency table Left-handed Not left-handed Total Male 23 217 240 Female 65 455 520 Total 88 672 760 We would like to test the directed claim that left-handedness is more common among men than women: H0: Left-handedness is independent of gender; Ha: Left-handedness is more common among men than women. Assume that the marginals are fixed and H0 true. The random variable “frequency count in cell (1, 1)” (i.e. number of left-handed men) has the same distribution as the random variable ”number of men in a random sample without replacement of size k = 88 from a group of N = 760 people, of which m = 240 are men.” The latter random variable has a known discrete distribution: a hypergeometric distribution with parameters m = 240, N = 760 and k = 88.

Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-27
SLIDE 27

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Exercise 12 (10.3): Lefties

The random variable “frequency count in cell (1, 1)” (i.e. number of left-handed men) has, given that the marginals are fixed and H0 true, a hypergeometric distribution with parameters m = 240, N = 760 and k = 88. The expected value of this random variable is mk

N = 240·88 760

≈ 27.8. General idea: Fisher’s exact test rejects H0 if observed value o11 deviates too much from expected value. Note that the critical value (or P-value) depends on the particular claim about the frequency count in cell (1, 1). So in this case, we reject H0 for large values of O11. (If Ha were ‘Left-handedness is more common among women than men’, we would reject for small values of O11.) In R: use fisher.test(data,alt="greater"), where "greater" specifies that H0 gets rejected for higher values of the frequency count in (1, 1) than what one would expect under H0. Adjust to "less" if H0 should be rejected for smaller values.

> left=matrix(c(23,217,65,455),nrow=2,ncol=2,byrow=T) > fisher.test(left,alt="greater") Fisher's Exact Test for Count Data data: left p-value = 0.903 #ignore rest of output Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-28
SLIDE 28

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

10.3 Contingency Tables

Fisher’s exact test for 2 × 2 contingency table

◮ H0: row and column variables are independent;

Ha: occurence of ”first column category” is more common in group of ”first row category” than in group of ”second row category”.

  • r

H0: two populations have the same proportion of one characteristic; Ha: the proportion of the characteristic is bigger/smaller in one population. Significance level α.

◮ Test statistic: frequency count in cell (1, 1) has under H0 and given marginals a

hypergeometric distribution with parameters n = first row total, N = grand total and k = first column total.

◮ Compute p-value in R: use fisher.test(data,alt="greater") in this case. Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10

slide-29
SLIDE 29

Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test

What’s next?

◮ Assignment 4 – will be on Canvas later today ◮ Four more meetings: ◮ Thursday, December 7: Exercise class ◮ Monday, December 11: Question session + overview Lectures 5–10 ◮ Tuesday, December 12: Computer session (Assignment 4) ◮ Thursday, December 14: Exercise class – exams from previous years ◮ Details about the final exam: soon on Canvas Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10