Sample Size Vorasith Sornsrivichai, MD., FETP Epidemiology Unit, - - PowerPoint PPT Presentation

sample size
SMART_READER_LITE
LIVE PREVIEW

Sample Size Vorasith Sornsrivichai, MD., FETP Epidemiology Unit, - - PowerPoint PPT Presentation

Sample Size Vorasith Sornsrivichai, MD., FETP Epidemiology Unit, Faculty of Medicine Prince of Songkla University All nature is but art, unknown to thee; All chance, direction, which thou canst not see; All discord, harmony not understood;


slide-1
SLIDE 1

Sample Size

Vorasith Sornsrivichai, MD., FETP Epidemiology Unit, Faculty of Medicine Prince of Songkla University

slide-2
SLIDE 2

“All nature is but art, unknown to thee; All chance, direction, which thou canst not see; All discord, harmony not understood; All partial evil, universal good; And spite of pride, in erring reason's spite, One truth is clear, Whatever is, is right” ~ Alexander Pope ~

slide-3
SLIDE 3

3

How Much Is Enough?

“Is sample size of 30 subjects enough?” “If I sampling 10% of population will it be

OK?”

“Can I just use all 24 patients I have?”

slide-4
SLIDE 4

4

Objectives

To learn how to calculate the sample size needed to

  • btain a specified precision for an estimate of a

parameter

To learn how to calculate the sample size needed to

provide a specified power for a comparative study

slide-5
SLIDE 5

5

Outline of Presentation

Review of basic principle Determination of sample size Sample size calculation

slide-6
SLIDE 6

6

Source: http://trochim.human.cornell.edu/kb/random.htm

slide-7
SLIDE 7

7

Two Types Of Study Objective

Estimation: Approximation of some

parameters (magnitude or difference or ratio)

Critical feature is the precision of the estimation. e.g. “A public health officer seeks to estimate the

proportion of children in the district receiving vaccinations.”

Hypothesis testing: Examination of

proposed assumption

Critical feature is the power of the study e.g. “Is drug B more effective than drug A?”

slide-8
SLIDE 8

8

Determinants of The Sample Size

Effect size Level of significance Power of the test Variation of the outcome

slide-9
SLIDE 9

9

Other Determinants of The Sample Size

Research questions and objective of the study Defining the population and the population size Type of outcome e.g. dichotomous, continuous Outcome measurement e.g. single, repeated

measurements

Sampling technique e.g. cluster sampling Type of statistical methods Type of analysis e.g. subgroup analysis Non-responses or lost to follow-up

slide-10
SLIDE 10

10

Effect Size

RR, OR, RD, etc. The higher the effect size, the lower the

sample size needed

slide-11
SLIDE 11

“To err is human”

(, to forgive divine) ~ Alexander Pope~

slide-12
SLIDE 12

12

Errors

Truth Study Results

Ho is not true H0 is true Reject Ho

1 –

Power Type I error Fail to reject H0 Type II error 1 – Confidence

β

α

β

α

slide-13
SLIDE 13

13

Significance

False detection of difference/association by

“chance”

  • r Type I error

Statistical significance VS Epidemiological &

Clinical significance

α

slide-14
SLIDE 14

14

Power (1-

) is the probability of rejecting Ho when Ho is not true

Power of the test

Ha H0 Study number

Power = 9/10 *100 = 90%

β

slide-15
SLIDE 15

“Knowledge is an unending adventure at the edge of uncertainty.”

~ Jacob Bronowski ~

slide-16
SLIDE 16

16

Uncertainty…

Variability in the population: not all samples would

give exactly the same finding, i.e., there is uncertainty in making an inference

However, the uncertainty can usually be quantified Uncertainty can be reduced by using a sufficiently

large sample

slide-17
SLIDE 17

Population Sample n = 2 n = 5 n = 20

slide-18
SLIDE 18

Central Limit Theorem

If samples are drawn from a non-normally distributed

parent population, the frequency distribution of the population of sample means approaches the normal distribution as the sample size increases.

Population Sample n = 2 n = 5 n = 20

slide-19
SLIDE 19

19

Sampling Distributions

As the sample size increases:

the sample means tend to be distributed normally the width of the distribution decreases

As the number of samples increases:

the mean of the distribution of sample means tends to the

mean of the population

The above is also true for sample estimates of population proportion

as long as the proportion is not too close to 0 or 1

slide-20
SLIDE 20

20

Standard Normal Distribution

  • 2.56 -1.96 -1

0 1 1.96 2.56

0.6826 0.9545 0.9973

X-3SE X-2SE X-1SE X X+1SE X+2SE X+3SE

slide-21
SLIDE 21

21

Estimation

Distribution of estimate of the means from many samples Big n Small n

Narrow SE Wide SE

slide-22
SLIDE 22

22

Range of population values

  • f X bar compatible with
  • ur study value

Estimation

(large sample)

Sampling distributions from populations with various values of X bar

Study value

  • f X bar

d d

d = precision

slide-23
SLIDE 23

23

Estimation

(small sample)

Sampling distributions from populations with various values of X bar

d d

Study value

  • f X bar

Range of population values

  • f X bar compatible with
  • ur study value

d = precision

slide-24
SLIDE 24

Population1

μ1~150 cm.

σ1~ 5 cm. Population2

μ2~150 cm. σ2~ 10 cm. d

X

d

Estimate of mean height α = 0.05 d=3 cm.

x1 x2 n1=12 n2=45

Distribution of means of hypothetical samples

slide-25
SLIDE 25

25

SDA ~ σ SDB~ σ

25

SEB=SDB/ N= ∞

μ

N= ∞

μ

σ σ σ σ Population Population Uncertainty in measure sample A

X

Estimation Uncertainty in measure sample B

X

Estimation Sample A n=100 Data

A

X

Sample B n=25 Data

B

X 100

SEA=SDA/

slide-26
SLIDE 26

Sample Size Calculation

slide-27
SLIDE 27

27

Sample Size Calculation

Available tables Nomogram Manual calculation Software: EpiInfo, STATA, R, OpenEpi

slide-28
SLIDE 28

28

Available Table

e.g. sample size to estimate P within d absolute

percentage points with 99% confidence

slide-29
SLIDE 29

29

http://ccforum.com/content/6/4/335

Nomogram

slide-30
SLIDE 30

OpenEpi

Open Source Epidemiologic Statistics for Public Health http://www.openepi.com

slide-31
SLIDE 31

31

slide-32
SLIDE 32

32

slide-33
SLIDE 33

33

slide-34
SLIDE 34

34

slide-35
SLIDE 35

35

slide-36
SLIDE 36

36

slide-37
SLIDE 37

37

slide-38
SLIDE 38

38

slide-39
SLIDE 39

39

slide-40
SLIDE 40

40

slide-41
SLIDE 41

41

slide-42
SLIDE 42

42

slide-43
SLIDE 43

43

slide-44
SLIDE 44

44

slide-45
SLIDE 45

45

slide-46
SLIDE 46

46

slide-47
SLIDE 47

47

slide-48
SLIDE 48

48

slide-49
SLIDE 49

49

Considerations

The appropriate sample size may not be the

same for all objectives in a study.

Therefore calculate for all objectives then

decide

All sample size calculations considered here

and in most computer programs assume simple random sampling

Other sampling method e.g. cluster

sampling may require adjustments

slide-50
SLIDE 50

50

Considerations

Calculated sample size is the minimum sample

needed

Add more (~10–30%) for non-response and

lost to follow up

E.g. suppose 10% of subjects in the study are

expected to refuse to participate or to drop out before the study ends.

The total number of n/(1-0.1) eligible

subjects would have to be approached in the first instance

slide-51
SLIDE 51

51

Inappropriate Sample Size

Too SMALL wide CI unable to detect a

real effect

may miss important

association

Too BIG waste of reource

(effort, time, money)

even very small

effects become statistical significant

may be unethical

slide-52
SLIDE 52

“Although our intellect always longs for clarity and certainty,

  • ur nature often finds uncertainty fascinating.”

~ Karl von Clausewitz ~

slide-53
SLIDE 53

53

Sample Size Calculation

One sample

Estimating: proportion, mean Hypothesis testing: proportion, mean

Two sample

Estimating: difference between two proportions,

two means

Hypothesis testing: difference between two

proportions, two means

slide-54
SLIDE 54

54

SE

2 / 1

× =

−α

Z d

/2 α

d

Sample size calculations for estimation are based on : In each case, we just put in the appropriate expression for standard error e.g.

n SD/

slide-55
SLIDE 55

55

Estimating A Population Mean

n SE σ =

2 2 2

d Z n

1

σ

α/2 −

= ∴

n Z d

1

σ

α/2 −

= ∴

SE Z d

1

× =

− 2 / α

slide-56
SLIDE 56

56

Example 1

(Estimating a population mean)

An estimate is desired of the average retail price of 20

tablets of a tranquilizer. It is required to be within 10 %

  • f the true average price with 95 %CI. The SD in price

was estimated as 85 %. How many pharmacies should be randomly selected?

n = (1.96)2(0.85)2/(0.1)2 ~278

2 2 2 / 2

d Z n σ

α

=

slide-57
SLIDE 57

57

Estimating A Population Proportion

n p 1 p SE ) ( − =

2 2

) ( d p 1 p Z n

1

− = ∴

/2 −α

SE Z d

1

× =

− 2 / α

slide-58
SLIDE 58

58

Example 2

(Estimating a population proportion)

A district public health officer seeks to estimate the

proportion of children in the district receiving appropriate childhood vaccinations. How many children must be studied if the resulting estimate is to fall within 10 % of the true proportion with 95% CI.

n = (1.96)2(0.25)/(0.1)2 = 96.04

2 1

)/d p

  • 1

p Z n (

2 / 2 α −

=

slide-59
SLIDE 59

59

Parameter Estimation

The sample selected will

be largest when P = 0.5

When one has no idea

what the level of P is in the population, choosing 0.5 for P will always provide enough

  • bservations.

P P(1-P) 0.5 0.25 0.4 0.24 0.3 0.21 0.2 0.16 0.1 0.09

slide-60
SLIDE 60

60

Hypothesis Testing

mean of B - mean of A

β

α

) ( ) ( ) ( ) ( *

1 2 / 1 2 /

SE Z SE Z SE Z SE Z

a a

× + × = Δ × = × − Δ =

β β

Δ

*

1

Δ is minimum effect worth detecting

slide-61
SLIDE 61

61

Basic Equations Underlying Sample Size

) (

/ β β α

Z Z SE SE Z SE Z

α/2 1 1 2 1

+ = Δ × + × = Δ

− −

2a

If SE0= SE1 = SE then

2b

SE Z d

1

Most sample size calculations for estimation and hypothesis testing are based on these equations.

× =

− 2 / α

1

slide-62
SLIDE 62

62

How to Choose Δ

Δ should be the “minimum difference of

clinical significance”, or the “minimum difference worth detecting”.

Previously reported differences may

not be suitable for your study.

It may be useful to consider the

standardized effect size (ε = Δ/σ) when the outcome is a continuous variable.

slide-63
SLIDE 63

63

Estimating The Difference Between Two Means

2 1

n 1 n 1 SE + = σ

2 2 2

) ( d Z r 1 1 n

1 1

σ

α/2 −

+ = ∴

SE Z d

1

× =

− 2 / α

r n 1 n 1 SE

1 1

+ = σ

then r n n If

1 2

=

slide-64
SLIDE 64

64

Example 3

(Estimating the difference between two means)

Nutritionists wish to estimate the difference in caloric intake

at lunch between children in a school offering hot lunches and children in a school which does not. From other studies, they estimate that the SD of caloric intake among schoolchildren is 75 calories, and they wish to make their estimate to within 20 calories of the true difference with 95% confidence. (Equal numbers in each group)

  • n1 = 2 * (1.96)2 * 752 / 202

= 108.05 ~ 109 (Note that r = n2/n1)

2 2 2 / 2

] / [ d Z r 1 1 n

1 1

σ

α −

+ =

slide-65
SLIDE 65

65

Estimating the difference between two proportions

r n p 1 p n p 1 p SE

1 2 2 1 1 1

) ( ) ( − + − =

2 2

) ( ) ( d r p 1 p p 1 p Z n

2 2 1 1 1 1

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − + − = ∴

/2 −α

SE Z d

1

× =

− 2 / α

⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ − + − = ∴

− 1 2 1 1 1

n p 1 r p p 1 p Z d ) ( ) (

2 2 α

slide-66
SLIDE 66

66

Example 4 (Estimating the difference between two proportions)

It is desired to estimate a risk difference in two

industrial groups. How large a sample should be selected in each group for the estimate to be within 5 percentage points of the true difference with 95% confidence. It was observed that P1 = 0.4, P2 = 0.32. (Equal numbers in each group)

n1 = 1.96 [(0.40)(0.60) +

(0.32)(0.68)]/(0.05)2 ~ 704

2 2 / 2 1

] / ) ( ) ( [ d r p 1 p p 1 p Z n

2 2 1 1 1

− + − =

−α

slide-67
SLIDE 67

67

Testing the hypothesis of a difference between two means

r n 1 n 1 SE

1 1

+ = σ

( )

β α

Z Z SE

1

+ = Δ

− 2 /

( )

Z

  • Z

r n 1 n 1

2 1 1 1

σ

β α

+ + = Δ ∴

( )

Z Z r n 1 r

2 2

  • 1

1 2 2

σ

β α +

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + = Δ ∴

( )

2 2 2

) ( Δ + + = ∴

/2 −

σ

β α

Z Z r 1 1 n

1 1

slide-68
SLIDE 68

68

Example 5

(Testing the hypothesis of a difference between two means)

A study is being designed to measure the effect, on systolic

blood pressure, of lowering sodium in the diet. From a pilot study it is observed that the SD of SBP in a community with high sodium diet is 12 mm Hg, while that in a group with low sodium diet is 10.3 mm Hg. If alpha is 0.05 and beta is 0.10, how large a sample from each community should be selected in order to detect a 2 mm Hg difference in blood pressure between the communities? (Equal group size and use pooled variance)

  • n1 = 2 [1.96 + 1.28]2 (125.05) / 22 = 656.36 ~ 657

2 2 2 2 /

) )( / ( Δ + + =

σ

β α

Z Z r 1 1 n

1 1

slide-69
SLIDE 69

69

Testing the hypothesis of a difference between two proportions) Ho true: Ho not true:

1 2 1

SE Z SE Z × + × = Δ

− β α /

2 2 2 2 /

] [

/ ) 1 ( ) 1 ( ) / )( ( Δ − + − + + − = ∴

r p p p p Z r 1 1 p 1 p Z n

1 1 2 1 1 β α

1

n r 1 1 p 1 p SE + × − = ) (

( ) ( )

r 1 rp p p

2 1

+ + =

1 2 2 1 1 1 1

rn p 1 p n p 1 p SE ) ( ) ( − + − =

1 2 2 1 1 1 a 1

n r p 1 p p 1 p Z n r 1 1 p 1 p Z ) ( ) ( ) )( (

2

− + − + + − = Δ ∴

− β

slide-70
SLIDE 70

70

Example 6

(Testing the hypothesis of a difference between two proportions)

) /( ) ( / ) ( ) ( ) / )( (

2 2 2 /

] [

r 1 rp p p r p 1 p p 1 p Z r 1 1 p 1 p Z n

2 1 2 2 1 1 1 1

+ + = Δ − + − + + − =

− β α

A case-control study is to be conducted with a case:control ratio of 1:2. Exposure to the potential risk factor of interest among controls is expected to be 20%. How many cases and controls will be needed to detect an odds ratio of at least 2.0, at a significance level of 0.05 with a power of 80 percent? (Let n1=number of cases, and n2 = number of controls)

( )

2 2

20 . 33 . 2 / ) 20 . 1 ( 20 . ) 33 . 1 ( 33 . 84 . ) 2 1 1 )( 243 . 1 ( 243 . 96 . 1

] [

− − + − + + − = / n1

n1 = 132 and n2 = 2 * 132 = 264

slide-71
SLIDE 71

71

Power Determination

Power = 1- =1-function(A)=1-P(Z )

Continuous data

(nt=nc) (nt nc)

c t c t t c c t c

n n n n Z A n Z A / ) ( / 2

2 2 / 2 2 /

σ μ μ σ μ μ

α α

+ − − = − − =

β

β

slide-72
SLIDE 72

72

To compare a new antihypertensive drug with the standard

treatment (n=150 in each group). The difference in BP treated with these two drugs was 4 mmHg. The variance was 140

  • mmHg. The significant level was 0.5. The researcher found no

difference in these two drugs. Do you agree with this conclusion?

A= Power = 1-function(A)

=1-f(-0.9677) =1-0.1685 =0.8315

Power = 83.15%

Exercise

9677 . 150 / ) 140 ( 2 4 96 . 1 − = −

slide-73
SLIDE 73

73

Power Determination

Discrete data and proportion

nt=nc nt nc

c t t c c c t c t c c t t c c t c c

n Q P n Q P P P n Q P n Q P Z A n Q P Q P P P n Q P Z A / ) / / / / ) / 2 + − − + = + − − =

α α

P Q P P P

c t

− = + = 1 ) ( 2 / 1

slide-74
SLIDE 74

74

Exercise

To compare between 2 kinds of anti UV cream, A and B.

Seventy five of 100 patients treated with cream A whereas 65 out of 100 patients treated with cream B improved. The researcher concluded that these two kinds of cream were not different at 5% of level of significance. Do you agree with this conclusion?

Power = 1- f(0.1026) = 1-0.46 = 0.54 ~54%

1026 . 100 / )] 25 . )( 75 . ( ) 35 . )( 65 . [( 65 . 75 . 100 / ) 30 )(. 70 (. 2 645 . 1 = + − − = A

slide-75
SLIDE 75

75

Interpretations Of “Negative Findings” - Power Calculations

For a hypothesis-testing study which fails to reject

the null hypothesis, it is useful to conduct a post- hoc power calculation.

We can use a rearrangement of the relevant

sample-size equation.

This should be done using the clinically relevant

difference for the Δ of the equation (not the difference found in the study).

slide-76
SLIDE 76

76

β − 1

Power depends on:

the size of difference the

treatment makes

the rates of events

among control patients

the alpha level in use the number of patients in

the trial

β − 1

Nonrejection region Rejection region