Lecture 4: Permutation Methods Applied Statistics 2014 1 / 21 - - PowerPoint PPT Presentation

lecture 4 permutation methods
SMART_READER_LITE
LIVE PREVIEW

Lecture 4: Permutation Methods Applied Statistics 2014 1 / 21 - - PowerPoint PPT Presentation

Randomization Model Population Model Rank Tests Assignment Lecture 4: Permutation Methods Applied Statistics 2014 1 / 21 Randomization Model Population Model Rank Tests Assignment Permutation Methods Non-parametric methods for testing


slide-1
SLIDE 1

Randomization Model Population Model Rank Tests Assignment

Lecture 4: Permutation Methods

Applied Statistics 2014

1 / 21

slide-2
SLIDE 2

Randomization Model Population Model Rank Tests Assignment

Permutation Methods

Non-parametric methods for testing difference among samples (or groups). These tests can serve as alternatives to some classical tests such two-sample t-tests and ANOVA tests. First introduced in Fisher (1935) and Pitman (1937) There are two typical settings.

Randomization Model: randomization tests Population Model: permutation tests

Provide a unified framework for rank-based tests such as Wilcoxon rank test It is computationally intensive

2 / 21

slide-3
SLIDE 3

Randomization Model Population Model Rank Tests Assignment

Randomization Model

Basis: subjects are randomly assigned to different treatments (usual practice in medicine) The only random aspect of the model is the assignment of treatments. Inference is limited to subjects under study. There is no population.

3 / 21

slide-4
SLIDE 4

Randomization Model Population Model Rank Tests Assignment

Randomization Model - Example

Example (Ernst (2004)) A new treatment for post-surgical recovery is compared with a standard treatment. Of the n subjects available for the study, n1 are randomly assigned to receive the new treatment, while the remaining n2 = n − n1 receive the standard treatment. The corresponding recovery times (in days) are recorded: X1, . . . , Xn1 and Y1, . . . , Yn2, for new and standard treatments, respectively. H0: There is no difference between the treatments. Ha: The new treatment decreases the recovery times. Test statistic: Td = ¯ X − ¯ Y .

4 / 21

slide-5
SLIDE 5

Randomization Model Population Model Rank Tests Assignment

Randomization Model - Example

Specifically, n = 7, n1 = 4 and n2 = 3 and (x1, x2, x3, x4) = (19, 22, 25, 26), (y1, y2, y3) = (23, 33, 40) td = ¯ x − ¯ y = −9 How to compute the p-value? The only random aspect is the random assignment of treatment. So if H0 is true , then the recovery time for each subject will be the same regardless of which treatment is received. Under H0, the distribution of Td is obtained based on the permutation

  • f the values of xi’s and yi’s.

5 / 21

slide-6
SLIDE 6

Randomization Model Population Model Rank Tests Assignment

Randomization Model - Example

There are in total 7

4

  • =35 equally likely randomizations.

i X1 X2 X3 X4 Y1 Y2 Y3 ti 1 19 22 25 26 23 33 40

  • 9.00

2 22 23 25 26 19 33 40

  • 6.67

3 22 33 25 26 19 23 40

  • 0.83

4 22 25 26 40 19 23 33 3.25 ... 35 19 23 33 40 22 25 26 4.42 The p-value is given by p = PH0(Td < td) = 35

i=1 I(ti ≤ td)

35 ≈ 0.0857.

6 / 21

slide-7
SLIDE 7

Randomization Model Population Model Rank Tests Assignment

Randomization Model - Example

  • 10
  • 5

5 10 0.03 0.05 0.07

Randomisation distribution

t prob

Figure : Reference distribution

7 / 21

slide-8
SLIDE 8

Randomization Model Population Model Rank Tests Assignment

Randomization Model - Remarks

Since the subjects are not randomly chosen, the conclusion can not be generalized to a broader range than the subjects under studied.

8 / 21

slide-9
SLIDE 9

Randomization Model Population Model Rank Tests Assignment

Population Model

Suppose there are two independent random samples: X1, . . . , Xn1 and Y1, . . . , Yn2. H0 : X1

d

= Y1 versus H1 : E(X1) > E(Y1). Test statistic T = ¯ Xn1 − ¯

  • Yn2. We reject H0 for large value of T.

Under H0, the reference distribution of T is obtained in the same way as in the randomization model.

9 / 21

slide-10
SLIDE 10

Randomization Model Population Model Rank Tests Assignment

Population Model

Let n = n1 + n2. Define Z1 = X1, . . . , Zn1 = Xn1, Zn1+1 = Y1, . . . , Zn = Yn2, and denote the observed values by (z1, . . . , zn). T =

1 n1

n1

i=1 Zi − 1 n2

n2

i=1 Zi+n1.

Under H0, Zi’s are iid. Define the event E = {(Z1, . . . , Zn) = (zpe(1), . . . , zpe(n)), for some permutation pe}. Then for any permutation ˜ p, PH0((Z1, . . . , Zn) = (z˜

p(1), . . . , z˜ p(n))|E) = 1

n!. Let ti =

1 n1

n1

i=1 z˜ p(i) − 1 n2

n2

i=1 z˜ p(i+n1). We have,

PH0(T = ti|E) = 1 n

n1

.

10 / 21

slide-11
SLIDE 11

Randomization Model Population Model Rank Tests Assignment

Population Model

We obtain the conditional sample of T: {t1, . . . , tm}, where m = n

n1

  • .

Write t = ¯ xn1 − ¯

  • yn2. The p-value is given by

#{i : ti ≥ t} n

n1

  • .

Note that the p-value is at least

1

(

n n1). 11 / 21

slide-12
SLIDE 12

Randomization Model Population Model Rank Tests Assignment

Population Model

Lemma

If the significance level α =

k

(

n n1) and we can take [tn−k+1, ∞] as the

critical region, where t(n−k+1)is the k-th largest value of ti’s. Then the permutation test is exact, that is PH0(T ≥ t(n−k+1)|E) = α.

12 / 21

slide-13
SLIDE 13

Randomization Model Population Model Rank Tests Assignment

Population Model

Lemma

If the significance level α =

k

(

n n1) and we can take [tn−k+1, ∞] as the

critical region, where t(n−k+1)is the k-th largest value of ti’s. Then the permutation test is exact, that is PH0(T ≥ t(n−k+1)|E) = α. Note that the critical value t(n−k+1) is a random cut as it depends

  • n the data (or observations).

It is a conditional test as it generates the permutation distribution of T conditional on the observed values. Conditional on the observed values, the permutation distribution of T does not depend on the underlying population G and F. Hence, the test is distribution free.

12 / 21

slide-14
SLIDE 14

Randomization Model Population Model Rank Tests Assignment

Population Model - Remarks

The basic idea is to generate a reference distribution by recalculating a statistic for many permutations of the data. Not all statistics can be used in permutation methods.

Suppose X ∼ N(µ1, σ2

1) and Y ∼ N(µ2, σ2 2). Based on two

independent samples, we want to test H0 : µ1 = µ2. If the variances are unknown and hence not necessary equal. Consider the t-statistics, T = ¯ Xm − ¯ Yn

  • S2

X/m + S2 Y /n

, The distribution of T is not invariant under permutation.

13 / 21

slide-15
SLIDE 15

Randomization Model Population Model Rank Tests Assignment

Population Model - Remarks

Exhuasitively computing all permutations is unfeasible for large values of n1 and n2. For instance, if n1 = n2 = 15, 30 15

  • > 155 million.

We can use Monte-Carlo methods to estimate the p-value.

Generate B samples from the permutation distribution.The function boot in R package boot can be useful for this purpose. Approximate p-value by its sample counterpart. ˆ p = 1 + B

i=1 I(ti ≥ t)

1 + B .

14 / 21

slide-16
SLIDE 16

Randomization Model Population Model Rank Tests Assignment

Population Model - Example

Byzantine coins. This is example 15.6 in Kvam and Vidakovic (2007). Researchers investigated the silver content (%Ag) of a num- ber of Byzantine coins discovered in Cyprus. The coins are from the first and fourth coinage in the reign of King Manuel I, Commenus (1143-1180). Based on the following data, we want to test if there is a significant difference between the two coinages in terms of silver content. For coins from the first coinage (X): (5.9, 6.8, 6.4, 7.0, 6.6, 7.7, 7.2, 6.9, 6.2) For coins from the fourth coinage (Y ): (5.3, 5.6, 5.5, 5.1, 6.2, 5.8, 5.8) H0 : X

d

= Y versus H1 : E(X) = E(Y ). This is a two-sided alternative.

15 / 21

slide-17
SLIDE 17

Randomization Model Population Model Rank Tests Assignment

Population Model - Example

We choose the test statistic T = ¯ X − ¯ Y . Note that n1 = 9 and n2 = 7. For each of the 16

9

  • = 11440 =: m permutations, we calculate the value

ti.

Permutation distribution, observed value in blue

T Density

  • 1.0
  • 0.5

0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0

The test statistics t = 1.13. Let ¯ t = 1

m

m

i=1 ti be the mean of the

permutation distribution. We define the two-sided p value as p = 1 m

m

  • i=1

I(|ti − ¯ t| ≥ |t − ¯ t|) = 0.000699.

16 / 21

slide-18
SLIDE 18

Randomization Model Population Model Rank Tests Assignment

Wilconxon/Mann-Whitney test

Suppose X1, . . . , Xn1

iid

∼ FX and Y1, . . . , Yn2

iid

∼ FY . Both FX and FY are continuous. H0 : FX = FY versus H1 : FX < FY Under H1, X1 is stochastically larger than Y1. Let (R1, . . . , Rn1+n2) be the ranks of the pooled sample (X1, . . . , Xn1, Y1, . . . , Yn2). So R1 is the rank of X1 in all n = n1 + n2 observations. Wilcoxon’s test statistics is T = n1

i=1 Ri. We reject H0 for large

value of T. What is the reference distribution of T Under H0?

17 / 21

slide-19
SLIDE 19

Randomization Model Population Model Rank Tests Assignment

Wilconxon/Mann-Whitney test

Under H0, we have (R1, . . . , Rn1) is a random sample without replacement from {1, 2, . . . , n1 + n2}; the distribution of T is known and does NOT depend on FX (= FY ).

18 / 21

slide-20
SLIDE 20

Randomization Model Population Model Rank Tests Assignment

Wilconxon/Mann-Whitney test

Under H0, we have (R1, . . . , Rn1) is a random sample without replacement from {1, 2, . . . , n1 + n2}; the distribution of T is known and does NOT depend on FX (= FY ). For n small, there are tables for the critical values. For large sample sizes, it can be shown (for instance by Lindeberg central limit theorem) that as n1, n2 → ∞, under H0, T − E(T)

  • V ar(T)

d

→ N(0, 1), where E(T) = 1

2n1(n + 1) and V ar(T) = n1n2 12 (n + 1).

18 / 21

slide-21
SLIDE 21

Randomization Model Population Model Rank Tests Assignment

Wilconxon/Mann-Whitney test

The Mann-Whitney statistic, U, counts the number of times a Yj precedes an Xi. T = U + 1 2n1(n1 + 1). T and U are equivalent. They lead to the same test.

19 / 21

slide-22
SLIDE 22

Randomization Model Population Model Rank Tests Assignment

Group Presentation (March 3)

Group 5

Seven patients each underwent three different methods of kidney

  • dialysis. The following values were obtained for weight change in

kilograms between dialysis sessions. Patient Treatment A Treatment B Treatment C 1 2.90 2.97 2.67 2 2.56 2.45 2.62 3 2.88 2.76 1.84 4 2.73 2.20 2.33 5 2.50 2.16 1.27 6 3.18 2.89 2.39 7 2.83 2.87 2.39 We wish to test if there is a significant difference in the mean weight change among the three methods. What is your test statistics? Implement your test in R code. What is the p-value? if necessary, please consider Monte-Carlo method.

20 / 21

slide-23
SLIDE 23

Randomization Model Population Model Rank Tests Assignment

Group Presentation (March 3)

Group 6

Give a presentation on the paper Ludbrook and Dudley (1998). Compare two methods for testing the equality of two means: two- sample t test (equal variance) and permutation test. Compare the performance of the two tests via simulation in R. Consider n1 = n2 = 10.

X ∼ N(1, 4) and Y ∼ N(0, 4). X ∼ N(0, 4) and Y ∼ N(0, 4). X ∼ t3 and Y ∼ N(−0.25, 3).

21 / 21