Sampling and Sample Size Rohit Naimpally J-PAL Course Overview 1. - - PowerPoint PPT Presentation

sampling and sample size
SMART_READER_LITE
LIVE PREVIEW

Sampling and Sample Size Rohit Naimpally J-PAL Course Overview 1. - - PowerPoint PPT Presentation

YEF ITCILO - JPAL Evaluating Youth Employment Programmes: An Executive Course 22 26 June 2015 ITCILO Turin, Italy Sampling and Sample Size Rohit Naimpally J-PAL Course Overview 1. Introduction to Impact Evaluation 2. Measurement


slide-1
SLIDE 1

Sampling and Sample Size

Rohit Naimpally J-PAL

YEF – ITCILO - JPAL

Evaluating Youth Employment Programmes: An Executive Course

22 – 26 June 2015 ǀ ITCILO Turin, Italy

slide-2
SLIDE 2

Course Overview

1. Introduction to Impact Evaluation 2. Measurement 3. Example of a Youth Evaluation Program in Uganda 4. How to Randomize

  • 5. Sampling and Sample Size

6. Threats and Analysis 7. Example of a Youth Employment Evaluation from Kenya 8. Cost-Effectiveness Analysis and Scaling Up

slide-3
SLIDE 3

Learning Objectives

At the end of the presentation, you will: 1. Know the Central Limit Theorem and the Law of Large Numbers, and why they matter. 2. Know the difference between a Type I and a Type II error. 3. Know what the “power” of a study is and why you should care. 4. Be ready to tackle the power exercise in the next session!

slide-4
SLIDE 4

THE basic questions in statistics

  • How confident can you be in your results?

–This is given by the significance level of your results (remember the “asterisks”?)

  • How big does your sample need to be?

–This is given by the power of your design.

slide-5
SLIDE 5

Recap: Which of these is more accurate?

  • A. I.
  • B. II.
  • C. Don’t know

I. II.

slide-6
SLIDE 6

Recap: Accuracy versus Precision

truth estimates

Precision (Sample Size)

Accuracy (Randomization)

slide-7
SLIDE 7

What’s the average result?

  • If you were to roll a die once, what is the

“expected result”? (i.e. the average)

slide-8
SLIDE 8

Possible results & probability: 1 die

1/6

1 2 3 4 5 6

slide-9
SLIDE 9

Rolling 1 die: possible results & average

1/6

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

Frequency Average

slide-10
SLIDE 10

What’s the average result?

  • If you were to roll two dice once, what is the

expected average of the two dice?

slide-11
SLIDE 11

Rolling 2 dice: Possible totals & likelihood

2 3 4 5 6 7 8 9 10 11 12 Frequency 1/36 1/18 1/12 1/9 5/36 1/6 5/36 1/9 1/12 1/18 1/36 1/6 1/4

slide-12
SLIDE 12

Rolling 2 dice: possible totals 12 possible totals, 36 permutations

Die 1 Die 2 2 3 4 5 6 7 3 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12

slide-13
SLIDE 13

Rolling 2 dice: Average score of dice & likelihood

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Frequency 1/36 1/18 1/12 1/9 5/36 1/6 5/36 1/9 1/12 1/18 1/36 1/6 1/4

Likelihood

slide-14
SLIDE 14

Outcomes and Permutations

Putting together permutations, you get:

  • 1. All possible outcomes
  • 2. The likelihood of each of those outcomes

Each block within a column represents one possible permutation (to obtain that average)

Each column represents one possible outcome (average result)

2.5

slide-15
SLIDE 15

Rolling 3 dice: 16 results 318, 216 permutations

1 1 1/3 1 2/3 2 2 1/3 2 2/3 3 3 1/3 3 2/3 4 4 1/3 4 2/3 5 5 1/3 5 2/3 6 Frequency 0% 1% 3% 5% 7% 10% 12% 13% 13% 12% 10% 7% 5% 3% 1% 0% 1/6

slide-16
SLIDE 16

Rolling 4 dice: 21 results, 1296 permutations

0% 2% 4% 6% 8% 10% 12% 14% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

slide-17
SLIDE 17

Rolling 5 dice: 26 results, 7776 permutations

0% 2% 4% 6% 8% 10% 12% 1 2 3 4 5 6

slide-18
SLIDE 18

Rolling 10 dice: 50 results, >60 million permutations

0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

Looks like a bell curve, or a normal distribution

slide-19
SLIDE 19

Rolling 30 dice: 150 results, 2 x 10 23 permutations

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

>95% of all rolls will yield an average between 3 and 4

slide-20
SLIDE 20

Rolling 100 dice: 500 results, 6 x 10 77 permutations

0.0% 0.5% 1.0% 1.5% 2.0% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

>99% of all rolls will yield an average between 3 and 4

slide-21
SLIDE 21

Rolling dice: 2 lessons

1. The more dice you roll, the closer most averages are to the true average (the distribution gets “tighter”)

  • THE LAW OF LARGE NUMBERS-

2. The more dice you roll, the more the distribution of possible averages (the sampling distribution) looks like a bell curve (a normal distribution)

  • THE CENTRAL LIMIT THEOREM-
slide-22
SLIDE 22

Accuracy versus Precision

truth estimates truth truth

Precision (Sample Size)

Accuracy (Randomization)

slide-23
SLIDE 23

THAT WAS JUST THE INTRODUCTION

slide-24
SLIDE 24

Outline

  • Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

  • Detecting impact
slide-25
SLIDE 25

Outline

  • Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

  • Detecting impact
slide-26
SLIDE 26

Baseline test scores

50 100 150 200 250 300 350 400 450 500 7 14 21 28 35 42 49 56 63 70 77 84 91 98 test scores frequency

slide-27
SLIDE 27

Mean = 26

26 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency mean

slide-28
SLIDE 28

Standard Deviation = 20

26

100 200 300 400 500 600 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency sd mean

1 Standard Deviation

slide-29
SLIDE 29

Let’s do an experiment

  • Take 1 Random test score from the pile of 16,000 tests
  • Write down the value
  • Put the test back
  • Do these three steps again
  • And again
  • 8,000 times
  • This is like a random sample of 8,000 (with replacement)
slide-30
SLIDE 30

What can we say about this sample?

26

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)

Good, the average of the sample is about 26…

slide-31
SLIDE 31

But…

  • I remember that as my sample goes, up, isn’t the

sampling distribution supposed to turn into a bell curve?

  • …(Central Limit Theorem)
  • Is it that my sample isn’t large enough?
slide-32
SLIDE 32

One limitation of statistical theory is that it assumes the population distribution is normally distributed

  • A. True
  • B. False
  • C. Depends
  • D. Don’t know
slide-33
SLIDE 33

The sampling distribution may not be normal if the population distribution is skewed

  • A. True
  • B. False
  • C. Depends
  • D. Don’t know
slide-34
SLIDE 34

Population vs. sampling distribution

26

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)

This is the distribution of my sample of 8,000 students!

slide-35
SLIDE 35

Outline

  • Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

  • Detecting impact
slide-36
SLIDE 36

How do we get from here…

200 400 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores

To here…

This is the distribution of the population (Population Distribution)

This is the distribution of Means from all Random Samples (Sampling distribution)

slide-37
SLIDE 37

Draw 10 random students, take the average, plot it: Do this 5 & 10 times.

Inadequate sample size No clear distribution around population mean

2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 5 Samples

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 10 Samples

slide-38
SLIDE 38

More sample means around population mean Still spread a good deal

Draw 10 random students: 50 and 100 times

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 50 Samples

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means with 100 Samples

slide-39
SLIDE 39

Distribution now significantly more normal Starting to see peaks

Draws 10 random students: 500 and 1000 times

10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 500 Samples

10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 1000 Samples

slide-40
SLIDE 40

Draw 10 Random students

  • This is like a sample size of 10
  • What happens if we take a sample size of 50?
slide-41
SLIDE 41

What happens to the sampling distribution if we draw a sample size

  • f 50 instead of 10, and take the mean (thousands of times)?
  • A. We will approach a bell

curve faster (than with a sample size of 10)

  • B. The bell curve will be

narrower

  • C. Both A & B
  • D. Neither. The

underlying sampling distribution does not change.

slide-42
SLIDE 42

N = 10 N = 50

2 4 6 8 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 5 Samples

1 2 3 4 5 6 7 8 9 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 10 Samples

2 4 6 8 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 5 Samples

1 2 3 4 5 6 7 8 9 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 10 Samples

slide-43
SLIDE 43

N = 10 N = 50

2 4 6 8 10 12 14 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 50 Samples

5 10 15 20 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means with 100 Samples

2 4 6 8 10 12 14 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 50 Samples

5 10 15 20 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 100 Samples

slide-44
SLIDE 44

N = 10 N = 50

  • 10

10 30 50 70 90 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 500 Samples

  • 40

10 60 110 160 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 1000 Samples

20 40 60 80 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 500 Samples

50 100 150 200 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 1000 Samples

slide-45
SLIDE 45

Outline

  • Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

  • Detecting impact
slide-46
SLIDE 46

Population & sampling distribution: Draw 1 random student (from 8,000)

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)

slide-47
SLIDE 47

Sampling Distribution: Draw 4 random students (N=4)

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=4)

slide-48
SLIDE 48

Law of Large Numbers : N=9

26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=9)

slide-49
SLIDE 49

Law of Large Numbers: N =100

26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=100)

slide-50
SLIDE 50

Central Limit Theorem: N=1

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_1 freq (N=1)

The white line is a theoretical distribution

slide-51
SLIDE 51

Central Limit Theorem : N=4

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_4 freq (N=4)

slide-52
SLIDE 52

Central Limit Theorem : N=9

26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_9 freq (N=9)

slide-53
SLIDE 53

Central Limit Theorem : N =100

26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_100 freq (N=100)

slide-54
SLIDE 54

So Why Do We Care?

  • Sampling distribution is a probability distribution
  • Sampling Distribution is a bell curve (irrespective of

what the underlying distribution is)

  • Why does it matter?
  • Why do we care if the probability distribution looks

like a bell curve?

  • Because we know how to calculate the area

underneath!

slide-55
SLIDE 55

95% Confidence Interval

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

1.96 SD 1.96 SD

slide-56
SLIDE 56

Outline

  • Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

  • Detecting impact
slide-57
SLIDE 57

Standard deviation/error

  • But wait! The regression results that I have seen

typically report the standard error, not the standard deviation.

  • What’s the difference between the standard

deviation and the standard error?

The standard error = the standard deviation

  • f the sampling distribution
slide-58
SLIDE 58
  • Variance = 400

𝜏2 = 𝑃𝑐𝑡𝑓𝑠𝑤𝑏𝑢𝑗𝑝𝑜 𝑊𝑏𝑚𝑣𝑓 − 𝐵𝑤𝑓𝑠𝑏𝑕𝑓 2 𝑂

  • Standard Deviation = 20

𝜏 = 𝑊𝑏𝑠𝑗𝑏𝑜𝑑𝑓

  • Standard Error = 20

𝑂

SE = 𝜏

𝑂

Variance and Standard Deviation

slide-59
SLIDE 59

Standard Deviation/ Standard Error

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd dist_1

slide-60
SLIDE 60

Sample size ↑ x4, SE ↓ ½

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se4 dist_4

slide-61
SLIDE 61

Sample size ↑ x9, SE ↓ ?

26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se9 dist_9

slide-62
SLIDE 62

Sample size ↑ x100, SE ↓?

26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se100 dist_100

slide-63
SLIDE 63

Outline

  • Sampling distributions
  • Detecting impact

– significance – effect size – power – baseline and covariates – clustering – stratification

slide-64
SLIDE 64

Baseline test scores

50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores

slide-65
SLIDE 65

We implement the Balsakhi Program

slide-66
SLIDE 66

Endline test scores

20 40 60 80 100 120 140 160 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores

After the balsakhi programs, these are the endline test scores

slide-67
SLIDE 67

20 40 60 80 100 120 140 160

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Endline test scores

The impact appears to be?

  • A. Positive
  • B. Negative
  • C. No impact
  • D. Don’t know

100 200 300 400 500

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Baseline test scores

slide-68
SLIDE 68

Post-test: control & treatment

20 40 60 80 100 120 140 160

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

test scores control treatment

Stop! That was the control group. The treatment group is red.

slide-69
SLIDE 69

Is this impact statistically significant?

  • A. Yes
  • B. No
  • C. Don’t know

20 40 60 80 100 120 140 160 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

test scores

control treatment control μ treatment μ

Average Difference = 6 points

slide-70
SLIDE 70

One experiment: 6 points

slide-71
SLIDE 71

One experiment

slide-72
SLIDE 72

Two experiments

slide-73
SLIDE 73

A few more…

slide-74
SLIDE 74

A few more…

slide-75
SLIDE 75

Many more…

slide-76
SLIDE 76

A whole lot more…

slide-77
SLIDE 77

slide-78
SLIDE 78

Running the experiment thousands of times…

By the Central Limit Theorem, these are normally distributed

slide-79
SLIDE 79

The assumption about your sample

The Central Limit Theorem and the Law

  • f Large Numbers hold if the sample is

randomly sampled from your population

slide-80
SLIDE 80

Theoretical Sampling distribution

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

slide-81
SLIDE 81

So let’s look at hypothesis testing

  • In criminal law, most institutions follow the rule: “innocent

until proven guilty”

  • In program evaluation, instead of “presumption of innocence,”

the rule is: “presumption of insignificance”

  • The “Null hypothesis” (H0) is that there was no (zero) impact of

the program

  • The burden of proof is on the evaluator to show a significant

difference – Think about how this relates to the discussion of ethics on Sunday.

slide-82
SLIDE 82

Hypothesis testing: conclusions

  • If it is very unlikely (less than a 5%

probability) that the difference is solely due to chance:

– We “reject our null hypothesis”

  • We may now say:

– “our program has a statistically significant impact”

slide-83
SLIDE 83

Hypothesis Testing: Steps

1. Determine the (size of the) sampling distribution around the null hypothesis H0 by calculating the standard error 2. Choose the confidence interval, e.g. 95% (or significance level: α) (α=5%) 3. Identify the critical value (boundary of the confidence interval) 4. If our observation falls in the critical region we can reject the null hypothesis

slide-84
SLIDE 84

Remember our 95% Confidence Interval?

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control

H0

1.96 SD 1.96 SD

slide-85
SLIDE 85

Impose significance level of 5%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control

H0

1.96 SD

H0 H0

slide-86
SLIDE 86

What is the significance level?

  • Type I error: rejecting the null hypothesis even

though it is true (false positive)

  • Significance level: The probability that we will

reject the null hypothesis even though it is true

slide-87
SLIDE 87

What is Power?

  • Type II Error: Failing to reject the null

hypothesis (concluding there is no difference), when indeed the null hypothesis is false.

  • Power: If there is a measureable effect of our

intervention (the null hypothesis is false), the probability that we will detect an effect (reject the null hypothesis)

slide-88
SLIDE 88

Hypothesis testing: 95% confidence

YOU CONCLUDE CLUDE Effective No Effect THE TRUTH TH Effective

Type e II Error

  • r

(low power)

No Effect Type e I Error (5% of the time)

slide-89
SLIDE 89

Before the experiment

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment

Assume two effects: no effect and treatment effect β

H0 Hβ

slide-90
SLIDE 90

Impose significance level of 5%

Anything between lines cannot be distinguished from 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

Hβ H0

Type I Error

slide-91
SLIDE 91

Can we distinguish Hβ from H0 ?

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

Shaded area shows % of time we would find Hβ true if it was

Hβ H0

Type II Error

slide-92
SLIDE 92

What influences power?

  • What are the factors that change the proportion
  • f the research hypothesis that is shaded—i.e.

the proportion that falls to the right (or left) of the null hypothesis curve?

  • Understanding this helps us design more

powerful experiments.

slide-93
SLIDE 93

Power: main ingredients

  • 1. Sample Size (N)
  • 2. Effect Size (δ)
  • 3. Variance (σ)
  • 4. Proportion of sample in T vs. C
  • 5. Clustering (ρ)
  • 6. Non-Compliance (akin to δ↓)
slide-94
SLIDE 94

Power: main ingredients

  • 1. Sample Size (N)
  • 2. Effect Size (δ)
  • 3. Variance (σ)
  • 4. Proportion of sample in T vs. C
  • 5. Clustering (ρ)
  • 6. Non-Compliance (akin to δ↓)
slide-95
SLIDE 95

By increasing sample size you increase…

  • A. Accuracy
  • B. Precision
  • C. Both
  • D. Neither
  • E. Don’t know

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

slide-96
SLIDE 96

Power: Effect size = 1SE, Sample size = N

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

Remember, your sampling distribution becomes narrower as N↑

slide-97
SLIDE 97

Power: Sample size = 4N

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

slide-98
SLIDE 98

Power: 64%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

slide-99
SLIDE 99

Power: Sample size = 9N

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

slide-100
SLIDE 100

Power: 91%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

slide-101
SLIDE 101

Power: main ingredients

  • 1. Sample Size (N)
  • 2. Effect Size (δ)
  • 3. Variance (σ)
  • 4. Proportion of sample in T vs. C
  • 5. Clustering (ρ)
  • 6. Non-Compliance (akin to δ↓)
slide-102
SLIDE 102

Effect size = 1*SE

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

1 SE

H0 Hβ

slide-103
SLIDE 103

The Null Hypothesis would be rejected only 26% of the time

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

Power: 26% If the true impact was 1*SE… Hβ H0

Effect size = 1*SE: Power = 26%

slide-104
SLIDE 104

Bigger hypothesized effect size distributions farther apart

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment

H H

β

Effect size = 3*SE

3*SE

slide-105
SLIDE 105

Bigger Effect size means more power

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

Effect size = 3*SE: Power = 91%

H0 Hβ

slide-106
SLIDE 106

What effect size should you use when designing your experiment?

  • A. Smallest effect size

that is still cost effective

  • B. Largest effect size

you expect your program to produce

  • C. Both
  • D. Neither
slide-107
SLIDE 107

Effect size

  • What effect size should we pick while calculating

the optimal sample size, assuming no other constraints?

  • Ideally, we design our experiment to detect the

smallest effect size that is still interesting.

– Interesting, as long as the value of that answer is worth the cost of the evaluation.

  • This is where “substantive significance” matters.
slide-108
SLIDE 108

Power: main ingredients

  • 1. Sample Size (N)
  • 2. Effect Size (δ)
  • 3. Variance (σ)
  • 4. Proportion of sample in T vs. C
  • 5. Clustering (ρ)
  • 6. Non-Compliance (akin to δ↓)
slide-109
SLIDE 109

Variance

  • There is sometimes very little we can do to reduce the noise
  • The underlying variance is what it is- just a characteristic of

the population at hand!

  • We can try to “absorb” variance:

– using a baseline – controlling for other variables

  • In practice, controlling for other variables (besides the baseline
  • utcome) buys you very little
slide-110
SLIDE 110

Power: main ingredients

  • 1. Sample Size (N)
  • 2. Effect Size (δ)
  • 3. Variance (σ)
  • 4. Proportion of sample in T vs. C
  • 5. Clustering (ρ)
  • 6. Non-Compliance (akin to δ↓)
slide-111
SLIDE 111

Clustered design: intuition

  • You want to know how close the upcoming state

elections will be

  • Method 1: Randomly select 50 people from entire

state (N=50)

  • Method 2: Randomly select 5 families in the state,

and ask ten members of each family their opinion (N=50)

slide-112
SLIDE 112

HIGH intra-cluster correlation (ICC) aka ρ (rho)

slide-113
SLIDE 113

LOW intra-cluster correlation (ICC) aka ρ (rho)

slide-114
SLIDE 114

All uneducated people live in one village. People with only primary education live in another. College grads live in a third, etc. ICC (ρ) on education will be..

  • A. High
  • B. Low
  • C. No effect on rho
  • D. Don’t know
slide-115
SLIDE 115

If ICC (ρ) is high, what is a more efficient way of increasing power?

  • A. Include more

clusters in the sample

  • B. Include more

people in clusters

  • C. Both
  • D. Don’t know
slide-116
SLIDE 116

BONUS SLIDES (TIME PERMITTING…)

slide-117
SLIDE 117

Testing multiple treatments

Control Group Balsakhi CAL program Balsakhi + CAL ←0.15 SD→ ↖ 0.25 SD ↘ ↑ 0.15 SD ↓ ↑ 0.10 SD ↓ ←0.10 SD→ ↗ 0.05 SD ↙ 50 50 50 100 100 100 200 200

slide-118
SLIDE 118

Power: main ingredients

  • 1. Sample Size (N)
  • 2. Effect Size (δ)
  • 3. Variance (σ)
  • 4. Proportion of sample in T vs. C
  • 5. Clustering (ρ)
  • 6. Non-Compliance (akin to δ↓)
slide-119
SLIDE 119

Power!

 

 

 

) 1 ( 1 * 1 1 *

2 1

    

m N P P t t EffectSize  

  Effect Size Variance Sample Size Significance Level Power Proportion in Treatment ICC Average Cluster Size

slide-120
SLIDE 120

Power!

 

 

 

) 1 ( 1 * 1 1 *

2 1

    

m N P P t t EffectSize  

  Proportion in Treatment

slide-121
SLIDE 121

Sample split: 50% C, 50% T

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

H0 Hβ

slide-122
SLIDE 122

Power: 91%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

slide-123
SLIDE 123

If it’s not 50-50 split?

  • What happens to the relative fatness if the

split is not 50-50?

  • Say 25-75?
slide-124
SLIDE 124

Sample split: 25% C, 75% T

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

H0 Hβ

slide-125
SLIDE 125

Power: 83%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

slide-126
SLIDE 126
  • A. Unbearably long
  • B. Too long
  • C. Just right
  • D. Not long enough
  • E. Too short – more

time, please!

A. B. C. D. E.

20% 20% 20% 20% 20%

How was the length of this presentation?

slide-127
SLIDE 127
  • A. Too fast! I couldn’t

keep up.

  • B. Rushed
  • C. Just right
  • D. Slow
  • E. Too slow, I fell asleep.

A. B. C. D. E.

20% 20% 20% 20% 20%

How was the pace of this presentation?

slide-128
SLIDE 128
  • A. Very relevant
  • B. Quite useful
  • C. Perhaps
  • D. Not really
  • E. No – not useful at

all.

A. B. C. D. E.

20% 20% 20% 20% 20%

Was the content relevant to your work?

slide-129
SLIDE 129
  • A. 100%
  • B. 80%
  • C. 60%
  • D. 40%
  • E. 20%
  • F. < 20%

Before today, how much of this material did you already feel comfortable/ proficient in?

A. B. C. D. E. F.

17% 17% 17% 17% 17% 17%

slide-130
SLIDE 130
  • A. 100%
  • B. 80%
  • C. 60%
  • D. 40%
  • E. 20%
  • F. < 20%

After this presentation, how much of this material do you feel proficient in?

A. B. C. D. E. F.

17% 17% 17% 17% 17% 17%

slide-131
SLIDE 131

END!