[PPT] - Sampling and Sample Size Rohit Naimpally J-PAL Course Overview 1. PowerPoint Presentation

SLIDE 1

Sampling and Sample Size

Rohit Naimpally J-PAL

YEF – ITCILO - JPAL

Evaluating Youth Employment Programmes: An Executive Course

22 – 26 June 2015 ǀ ITCILO Turin, Italy

SLIDE 2

Course Overview

1. Introduction to Impact Evaluation 2. Measurement 3. Example of a Youth Evaluation Program in Uganda 4. How to Randomize

5. Sampling and Sample Size

6. Threats and Analysis 7. Example of a Youth Employment Evaluation from Kenya 8. Cost-Effectiveness Analysis and Scaling Up

SLIDE 3

Learning Objectives

At the end of the presentation, you will: 1. Know the Central Limit Theorem and the Law of Large Numbers, and why they matter. 2. Know the difference between a Type I and a Type II error. 3. Know what the “power” of a study is and why you should care. 4. Be ready to tackle the power exercise in the next session!

SLIDE 4

THE basic questions in statistics

How confident can you be in your results?

–This is given by the significance level of your results (remember the “asterisks”?)

How big does your sample need to be?

–This is given by the power of your design.

SLIDE 5

Recap: Which of these is more accurate?

A. I.
B. II.
C. Don’t know

I. II.

SLIDE 6

Recap: Accuracy versus Precision

truth estimates

Precision (Sample Size)

Accuracy (Randomization)

SLIDE 7

What’s the average result?

If you were to roll a die once, what is the

“expected result”? (i.e. the average)

SLIDE 8

Possible results & probability: 1 die

1/6

1 2 3 4 5 6

SLIDE 9

Rolling 1 die: possible results & average

1/6

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

Frequency Average

SLIDE 10

What’s the average result?

If you were to roll two dice once, what is the

expected average of the two dice?

SLIDE 11

Rolling 2 dice: Possible totals & likelihood

2 3 4 5 6 7 8 9 10 11 12 Frequency 1/36 1/18 1/12 1/9 5/36 1/6 5/36 1/9 1/12 1/18 1/36 1/6 1/4

SLIDE 12

Rolling 2 dice: possible totals 12 possible totals, 36 permutations

Die 1 Die 2 2 3 4 5 6 7 3 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12

SLIDE 13

Rolling 2 dice: Average score of dice & likelihood

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Frequency 1/36 1/18 1/12 1/9 5/36 1/6 5/36 1/9 1/12 1/18 1/36 1/6 1/4

Likelihood

SLIDE 14

Outcomes and Permutations

Putting together permutations, you get:

1. All possible outcomes
2. The likelihood of each of those outcomes

Each block within a column represents one possible permutation (to obtain that average)

Each column represents one possible outcome (average result)

2.5

SLIDE 15

Rolling 3 dice: 16 results 318, 216 permutations

1 1 1/3 1 2/3 2 2 1/3 2 2/3 3 3 1/3 3 2/3 4 4 1/3 4 2/3 5 5 1/3 5 2/3 6 Frequency 0% 1% 3% 5% 7% 10% 12% 13% 13% 12% 10% 7% 5% 3% 1% 0% 1/6

SLIDE 16

Rolling 4 dice: 21 results, 1296 permutations

0% 2% 4% 6% 8% 10% 12% 14% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

SLIDE 17

Rolling 5 dice: 26 results, 7776 permutations

0% 2% 4% 6% 8% 10% 12% 1 2 3 4 5 6

SLIDE 18

Rolling 10 dice: 50 results, >60 million permutations

0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

Looks like a bell curve, or a normal distribution

SLIDE 19

Rolling 30 dice: 150 results, 2 x 10 23 permutations

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

>95% of all rolls will yield an average between 3 and 4

SLIDE 20

Rolling 100 dice: 500 results, 6 x 10 77 permutations

0.0% 0.5% 1.0% 1.5% 2.0% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

>99% of all rolls will yield an average between 3 and 4

SLIDE 21

Rolling dice: 2 lessons

1. The more dice you roll, the closer most averages are to the true average (the distribution gets “tighter”)

THE LAW OF LARGE NUMBERS-

2. The more dice you roll, the more the distribution of possible averages (the sampling distribution) looks like a bell curve (a normal distribution)

THE CENTRAL LIMIT THEOREM-

SLIDE 22

Accuracy versus Precision

truth estimates truth truth

Precision (Sample Size)

Accuracy (Randomization)

SLIDE 23

THAT WAS JUST THE INTRODUCTION

SLIDE 24

Outline

Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

Detecting impact

SLIDE 25

Outline

Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

Detecting impact

SLIDE 26

Baseline test scores

50 100 150 200 250 300 350 400 450 500 7 14 21 28 35 42 49 56 63 70 77 84 91 98 test scores frequency

SLIDE 27

Mean = 26

26 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency mean

SLIDE 28

Standard Deviation = 20

26

100 200 300 400 500 600 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency sd mean

1 Standard Deviation

SLIDE 29

Let’s do an experiment

Take 1 Random test score from the pile of 16,000 tests
Write down the value
Put the test back
Do these three steps again
And again
8,000 times
This is like a random sample of 8,000 (with replacement)

SLIDE 30

What can we say about this sample?

26

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)

Good, the average of the sample is about 26…

SLIDE 31

But…

I remember that as my sample goes, up, isn’t the

sampling distribution supposed to turn into a bell curve?

…(Central Limit Theorem)
Is it that my sample isn’t large enough?

SLIDE 32

One limitation of statistical theory is that it assumes the population distribution is normally distributed

A. True
B. False
C. Depends
D. Don’t know

SLIDE 33

The sampling distribution may not be normal if the population distribution is skewed

A. True
B. False
C. Depends
D. Don’t know

SLIDE 34

Population vs. sampling distribution

26

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)

This is the distribution of my sample of 8,000 students!

SLIDE 35

Outline

Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

Detecting impact

SLIDE 36

How do we get from here…

200 400 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores

To here…

This is the distribution of the population (Population Distribution)

This is the distribution of Means from all Random Samples (Sampling distribution)

SLIDE 37

Draw 10 random students, take the average, plot it: Do this 5 & 10 times.

Inadequate sample size No clear distribution around population mean

2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 5 Samples

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 10 Samples

SLIDE 38

More sample means around population mean Still spread a good deal

Draw 10 random students: 50 and 100 times

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 50 Samples

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means with 100 Samples

SLIDE 39

Distribution now significantly more normal Starting to see peaks

Draws 10 random students: 500 and 1000 times

10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 500 Samples

10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 1000 Samples

SLIDE 40

Draw 10 Random students

This is like a sample size of 10
What happens if we take a sample size of 50?

SLIDE 41

What happens to the sampling distribution if we draw a sample size

f 50 instead of 10, and take the mean (thousands of times)?
A. We will approach a bell

curve faster (than with a sample size of 10)

B. The bell curve will be

narrower

C. Both A & B
D. Neither. The

underlying sampling distribution does not change.

SLIDE 42

N = 10 N = 50

2 4 6 8 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 5 Samples

1 2 3 4 5 6 7 8 9 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 10 Samples

2 4 6 8 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 5 Samples

1 2 3 4 5 6 7 8 9 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 10 Samples

SLIDE 43

N = 10 N = 50

2 4 6 8 10 12 14 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 50 Samples

5 10 15 20 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means with 100 Samples

2 4 6 8 10 12 14 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 50 Samples

5 10 15 20 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 100 Samples

SLIDE 44

N = 10 N = 50

10

10 30 50 70 90 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 500 Samples

40

10 60 110 160 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 1000 Samples

20 40 60 80 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 500 Samples

50 100 150 200 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 1000 Samples

SLIDE 45

Outline

Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

Detecting impact

SLIDE 46

Population & sampling distribution: Draw 1 random student (from 8,000)

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)

SLIDE 47

Sampling Distribution: Draw 4 random students (N=4)

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=4)

SLIDE 48

Law of Large Numbers : N=9

26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=9)

SLIDE 49

Law of Large Numbers: N =100

26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=100)

SLIDE 50

Central Limit Theorem: N=1

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_1 freq (N=1)

The white line is a theoretical distribution

SLIDE 51

Central Limit Theorem : N=4

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_4 freq (N=4)

SLIDE 52

Central Limit Theorem : N=9

26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_9 freq (N=9)

SLIDE 53

Central Limit Theorem : N =100

26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_100 freq (N=100)

SLIDE 54

So Why Do We Care?

Sampling distribution is a probability distribution
Sampling Distribution is a bell curve (irrespective of

what the underlying distribution is)

Why does it matter?
Why do we care if the probability distribution looks

like a bell curve?

Because we know how to calculate the area

underneath!

SLIDE 55

95% Confidence Interval

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6

1.96 SD 1.96 SD

SLIDE 56

Outline

Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

Detecting impact

SLIDE 57

Standard deviation/error

But wait! The regression results that I have seen

typically report the standard error, not the standard deviation.

What’s the difference between the standard

deviation and the standard error?

The standard error = the standard deviation

f the sampling distribution

SLIDE 58

Variance = 400

𝜏2 = 𝑃𝑐𝑡𝑓𝑠𝑤𝑏𝑢𝑗𝑝𝑜 𝑊𝑏𝑚𝑣𝑓 − 𝐵𝑤𝑓𝑠𝑏𝑕𝑓 2 𝑂

Standard Deviation = 20

𝜏 = 𝑊𝑏𝑠𝑗𝑏𝑜𝑑𝑓

Standard Error = 20

𝑂

SE = 𝜏

𝑂

Variance and Standard Deviation

SLIDE 59

Standard Deviation/ Standard Error

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd dist_1

SLIDE 60

Sample size ↑ x4, SE ↓ ½

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se4 dist_4

SLIDE 61

Sample size ↑ x9, SE ↓ ?

26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se9 dist_9

SLIDE 62

Sample size ↑ x100, SE ↓?

26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se100 dist_100

SLIDE 63

Outline

Sampling distributions
Detecting impact

– significance – effect size – power – baseline and covariates – clustering – stratification

SLIDE 64

Baseline test scores

50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores

SLIDE 65

We implement the Balsakhi Program

SLIDE 66

Endline test scores

20 40 60 80 100 120 140 160 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores

After the balsakhi programs, these are the endline test scores

SLIDE 67

20 40 60 80 100 120 140 160

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Endline test scores

The impact appears to be?

A. Positive
B. Negative
C. No impact
D. Don’t know

100 200 300 400 500

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Baseline test scores

SLIDE 68

Post-test: control & treatment

20 40 60 80 100 120 140 160

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

test scores control treatment

Stop! That was the control group. The treatment group is red.

SLIDE 69

Is this impact statistically significant?

A. Yes
B. No
C. Don’t know

20 40 60 80 100 120 140 160 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

test scores

control treatment control μ treatment μ

Average Difference = 6 points

SLIDE 70

One experiment: 6 points

SLIDE 71

One experiment

SLIDE 72

Two experiments

SLIDE 73

A few more…

SLIDE 74

A few more…

SLIDE 75

Many more…

SLIDE 76

A whole lot more…

SLIDE 77

…

SLIDE 78

Running the experiment thousands of times…

By the Central Limit Theorem, these are normally distributed

SLIDE 79

The assumption about your sample

The Central Limit Theorem and the Law

f Large Numbers hold if the sample is

randomly sampled from your population

SLIDE 80

Theoretical Sampling distribution

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6

SLIDE 81

So let’s look at hypothesis testing

In criminal law, most institutions follow the rule: “innocent

until proven guilty”

In program evaluation, instead of “presumption of innocence,”

the rule is: “presumption of insignificance”

The “Null hypothesis” (H0) is that there was no (zero) impact of

the program

The burden of proof is on the evaluator to show a significant

difference – Think about how this relates to the discussion of ethics on Sunday.

SLIDE 82

Hypothesis testing: conclusions

If it is very unlikely (less than a 5%

probability) that the difference is solely due to chance:

– We “reject our null hypothesis”

We may now say:

– “our program has a statistically significant impact”

SLIDE 83

Hypothesis Testing: Steps

1. Determine the (size of the) sampling distribution around the null hypothesis H0 by calculating the standard error 2. Choose the confidence interval, e.g. 95% (or significance level: α) (α=5%) 3. Identify the critical value (boundary of the confidence interval) 4. If our observation falls in the critical region we can reject the null hypothesis

SLIDE 84

Remember our 95% Confidence Interval?

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control

H0

1.96 SD 1.96 SD

SLIDE 85

Impose significance level of 5%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control

H0

1.96 SD

H0 H0

SLIDE 86

What is the significance level?

Type I error: rejecting the null hypothesis even

though it is true (false positive)

Significance level: The probability that we will

reject the null hypothesis even though it is true

SLIDE 87

What is Power?

Type II Error: Failing to reject the null

hypothesis (concluding there is no difference), when indeed the null hypothesis is false.

Power: If there is a measureable effect of our

intervention (the null hypothesis is false), the probability that we will detect an effect (reject the null hypothesis)

SLIDE 88

Hypothesis testing: 95% confidence

YOU CONCLUDE CLUDE Effective No Effect THE TRUTH TH Effective



Type e II Error

r

(low power)



No Effect Type e I Error (5% of the time)





SLIDE 89

Before the experiment

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment

Assume two effects: no effect and treatment effect β

H0 Hβ

SLIDE 90

Impose significance level of 5%

Anything between lines cannot be distinguished from 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

Hβ H0

Type I Error

SLIDE 91

Can we distinguish Hβ from H0 ?

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

Shaded area shows % of time we would find Hβ true if it was

Hβ H0

Type II Error

SLIDE 92

What influences power?

What are the factors that change the proportion
f the research hypothesis that is shaded—i.e.

the proportion that falls to the right (or left) of the null hypothesis curve?

Understanding this helps us design more

powerful experiments.

SLIDE 93

Power: main ingredients

1. Sample Size (N)
2. Effect Size (δ)
3. Variance (σ)
4. Proportion of sample in T vs. C
5. Clustering (ρ)
6. Non-Compliance (akin to δ↓)

SLIDE 94

Power: main ingredients

1. Sample Size (N)
2. Effect Size (δ)
3. Variance (σ)
4. Proportion of sample in T vs. C
5. Clustering (ρ)
6. Non-Compliance (akin to δ↓)

SLIDE 95

By increasing sample size you increase…

A. Accuracy
B. Precision
C. Both
D. Neither
E. Don’t know

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

SLIDE 96

Power: Effect size = 1SE, Sample size = N

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

Remember, your sampling distribution becomes narrower as N↑

SLIDE 97

Power: Sample size = 4N

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

SLIDE 98

Power: 64%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

SLIDE 99

Power: Sample size = 9N

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

SLIDE 100

Power: 91%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

SLIDE 101

Power: main ingredients

1. Sample Size (N)
2. Effect Size (δ)
3. Variance (σ)
4. Proportion of sample in T vs. C
5. Clustering (ρ)
6. Non-Compliance (akin to δ↓)

SLIDE 102

Effect size = 1*SE

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

1 SE

H0 Hβ

SLIDE 103

The Null Hypothesis would be rejected only 26% of the time

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

Power: 26% If the true impact was 1*SE… Hβ H0

Effect size = 1*SE: Power = 26%

SLIDE 104

Bigger hypothesized effect size distributions farther apart

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment

H H

β

Effect size = 3*SE

3*SE

SLIDE 105

Bigger Effect size means more power

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

Effect size = 3*SE: Power = 91%

H0 Hβ

SLIDE 106

What effect size should you use when designing your experiment?

A. Smallest effect size

that is still cost effective

B. Largest effect size

you expect your program to produce

C. Both
D. Neither

SLIDE 107

Effect size

What effect size should we pick while calculating

the optimal sample size, assuming no other constraints?

Ideally, we design our experiment to detect the

smallest effect size that is still interesting.

– Interesting, as long as the value of that answer is worth the cost of the evaluation.

This is where “substantive significance” matters.

SLIDE 108

Power: main ingredients

1. Sample Size (N)
2. Effect Size (δ)
3. Variance (σ)
4. Proportion of sample in T vs. C
5. Clustering (ρ)
6. Non-Compliance (akin to δ↓)

SLIDE 109

Variance

There is sometimes very little we can do to reduce the noise
The underlying variance is what it is- just a characteristic of

the population at hand!

We can try to “absorb” variance:

– using a baseline – controlling for other variables

In practice, controlling for other variables (besides the baseline
utcome) buys you very little

SLIDE 110

Power: main ingredients

1. Sample Size (N)
2. Effect Size (δ)
3. Variance (σ)
4. Proportion of sample in T vs. C
5. Clustering (ρ)
6. Non-Compliance (akin to δ↓)

SLIDE 111

Clustered design: intuition

You want to know how close the upcoming state

elections will be

Method 1: Randomly select 50 people from entire

state (N=50)

Method 2: Randomly select 5 families in the state,

and ask ten members of each family their opinion (N=50)

SLIDE 112

HIGH intra-cluster correlation (ICC) aka ρ (rho)

SLIDE 113

LOW intra-cluster correlation (ICC) aka ρ (rho)

SLIDE 114

All uneducated people live in one village. People with only primary education live in another. College grads live in a third, etc. ICC (ρ) on education will be..

A. High
B. Low
C. No effect on rho
D. Don’t know

SLIDE 115

If ICC (ρ) is high, what is a more efficient way of increasing power?

A. Include more

clusters in the sample

B. Include more

people in clusters

C. Both
D. Don’t know

SLIDE 116

BONUS SLIDES (TIME PERMITTING…)

SLIDE 117

Testing multiple treatments

Control Group Balsakhi CAL program Balsakhi + CAL ←0.15 SD→ ↖ 0.25 SD ↘ ↑ 0.15 SD ↓ ↑ 0.10 SD ↓ ←0.10 SD→ ↗ 0.05 SD ↙ 50 50 50 100 100 100 200 200

SLIDE 118

Power: main ingredients

1. Sample Size (N)
2. Effect Size (δ)
3. Variance (σ)
4. Proportion of sample in T vs. C
5. Clustering (ρ)
6. Non-Compliance (akin to δ↓)

SLIDE 119

Power!

 

) 1 ( 1 * 1 1 *

2 1

    



m N P P t t EffectSize  

  Effect Size Variance Sample Size Significance Level Power Proportion in Treatment ICC Average Cluster Size

SLIDE 120

Power!

 

) 1 ( 1 * 1 1 *

2 1

    



m N P P t t EffectSize  

  Proportion in Treatment

SLIDE 121

Sample split: 50% C, 50% T

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

H0 Hβ

SLIDE 122

Power: 91%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

SLIDE 123

If it’s not 50-50 split?

What happens to the relative fatness if the

split is not 50-50?

Say 25-75?

SLIDE 124

Sample split: 25% C, 75% T

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

H0 Hβ

SLIDE 125

Power: 83%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

SLIDE 126

A. Unbearably long
B. Too long
C. Just right
D. Not long enough
E. Too short – more

time, please!

A. B. C. D. E.

20% 20% 20% 20% 20%

How was the length of this presentation?

SLIDE 127

A. Too fast! I couldn’t

keep up.

B. Rushed
C. Just right
D. Slow
E. Too slow, I fell asleep.

A. B. C. D. E.

20% 20% 20% 20% 20%

How was the pace of this presentation?

SLIDE 128

A. Very relevant
B. Quite useful
C. Perhaps
D. Not really
E. No – not useful at

all.

A. B. C. D. E.

20% 20% 20% 20% 20%

Was the content relevant to your work?

SLIDE 129

A. 100%
B. 80%
C. 60%
D. 40%
E. 20%
F. < 20%

Before today, how much of this material did you already feel comfortable/ proficient in?

A. B. C. D. E. F.

17% 17% 17% 17% 17% 17%

SLIDE 130

A. 100%
B. 80%
C. 60%
D. 40%
E. 20%
F. < 20%

After this presentation, how much of this material do you feel proficient in?

A. B. C. D. E. F.

17% 17% 17% 17% 17% 17%

SLIDE 131