[PPT] - Sampling and Sample Size Rohit Naimpally J-PAL Course Overview 1. PowerPoint Presentation

SLIDE 1

Sampling and Sample Size

Rohit Naimpally J-PAL

SLIDE 2

Course Overview

1. What is Evaluation?
2. Outcomes, Impact, and Indicators
3. Why Randomize and Common Critiques
4. How to Randomize
5. Sampling and Sample Size
6. Threats and Analysis
7. Project from Start to Finish
8. Cost-Effectiveness Analysis and Scaling Up

SLIDE 3

Framing the discussion…

“Trevor was a painter. Indeed, few people escape that nowadays. But he was also an artist, and artists are rather rare.”

Oscar Wilde

“Power is as much an art as a science.”

Unknown (probably not Oscar Wilde)

SLIDE 4

Learning Objectives

At the end of the presentation, you will: 1. Know the Central Limit Theorem and the Law of Large Numbers, and why they matter. 2. Know the difference between a Type I and a Type II error. 3. Know what the “power” of a study is and why you should care. 4. Be ready to tackle the power exercise in the next session!

SLIDE 5

THE basic questions in statistics

How confident can you be in your results?

–This is given by the significance level of your results (remember the “asterisks”?)

How big does your sample need to be?

–This is given by the power of your design.

SLIDE 6

Recap: Which of these is more accurate?

A. I.
B. II.
C. Don’t know

I. II.

A. B. C.

0% 0% 0%

SLIDE 7

Recap: Accuracy versus Precision

truth estimates

Precision (Sample Size)

Accuracy (Randomization)

SLIDE 8

What’s the average result?

If you were to roll a die once, what is the

“expected result”? (i.e. the average)

SLIDE 9

Possible results & probability: 1 die

1/6

1 2 3 4 5 6

SLIDE 10

Rolling 1 die: possible results & average

1/6

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

Frequency Average

SLIDE 11

What’s the average result?

If you were to roll two dice once, what is the

expected average of the two dice?

SLIDE 12

Rolling 2 dice: Possible totals & likelihood

2 3 4 5 6 7 8 9 10 11 12 Frequency 1/36 1/18 1/12 1/9 5/36 1/6 5/36 1/9 1/12 1/18 1/36 1/6 1/4

SLIDE 13

Rolling 2 dice: possible totals 12 possible totals, 36 permutations

Die 1 Die 2 2 3 4 5 6 7 3 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12

SLIDE 14

Rolling 2 dice: Average score of dice & likelihood

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Frequency 1/36 1/18 1/12 1/9 5/36 1/6 5/36 1/9 1/12 1/18 1/36 1/6 1/4

Likelihood

SLIDE 15

Outcomes and Permutations

Putting together permutations, you get:

1. All possible outcomes
2. The likelihood of each of those outcomes

Each block within a column represents one possible permutation (to obtain that average)

Each column represents one possible outcome (average result)

2.5

SLIDE 16

Rolling 3 dice: 16 results 318, 216 permutations

1 1 1/3 1 2/3 2 2 1/3 2 2/3 3 3 1/3 3 2/3 4 4 1/3 4 2/3 5 5 1/3 5 2/3 6 Frequency 0% 1% 3% 5% 7% 10% 12% 13% 13% 12% 10% 7% 5% 3% 1% 0% 1/6

SLIDE 17

Rolling 4 dice: 21 results, 1296 permutations

0% 2% 4% 6% 8% 10% 12% 14% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

SLIDE 18

Rolling 5 dice: 26 results, 7776 permutations

0% 2% 4% 6% 8% 10% 12% 1 2 3 4 5 6

SLIDE 19

Rolling 10 dice: 50 results, >60 million permutations

0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

Looks like a bell curve, or a normal distribution

SLIDE 20

Rolling 30 dice: 150 results, 2 x 10 23 permutations

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

>95% of all rolls will yield an average between 3 and 4

SLIDE 21

Rolling 100 dice: 500 results, 6 x 10 77 permutations

0.0% 0.5% 1.0% 1.5% 2.0% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

>99% of all rolls will yield an average between 3 and 4

SLIDE 22

Rolling dice: 2 lessons

1. The more dice you roll, the closer most averages are to the true average (the distribution gets “tighter”)

THE LAW OF LARGE NUMBERS-

2. The more dice you roll, the more the distribution of possible averages (the sampling distribution) looks like a bell curve (a normal distribution)

THE CENTRAL LIMIT THEOREM-

SLIDE 23

Accuracy versus Precision

truth estimates truth truth

Precision (Sample Size)

Accuracy (Randomization)

SLIDE 24

THAT WAS JUST THE INTRODUCTION

SLIDE 25

Outline

Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

Detecting impact

SLIDE 26

Outline

Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

Detecting impact

SLIDE 27

Baseline test scores

50 100 150 200 250 300 350 400 450 500 7 14 21 28 35 42 49 56 63 70 77 84 91 98 test scores frequency

SLIDE 28

Mean = 26

26 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency mean

SLIDE 29

Standard Deviation = 20

26

100 200 300 400 500 600 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency sd mean

1 Standard Deviation

SLIDE 30

Let’s do an experiment

Take 1 Random test score from the pile of 16,000 tests
Write down the value
Put the test back
Do these three steps again
And again
8,000 times
This is like a random sample of 8,000 (with replacement)

SLIDE 31

What can we say about this sample?

26

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)

Good, the average of the sample is about 26…

SLIDE 32

But…

I remember that as my sample goes, up, isn’t the

sampling distribution supposed to turn into a bell curve?

…(Central Limit Theorem)
Is it that my sample isn’t large enough?

SLIDE 33

One limitation of statistical theory is that it assumes the population distribution is normally distributed

A. True
B. False
C. Depends
D. Don’t know

A. B. C. D.

0% 0% 0% 0%

SLIDE 34

The sampling distribution may not be normal if the population distribution is skewed

A. True
B. False
C. Depends
D. Don’t know

A. B. C. D.

0% 0% 0% 0%

SLIDE 35

Population vs. sampling distribution

26

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)

This is the distribution of my sample of 8,000 students!

SLIDE 36

Outline

Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

Detecting impact

SLIDE 37

How do we get from here…

200 400 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores

To here…

This is the distribution of the population (Population Distribution)

This is the distribution of Means from all Random Samples (Sampling distribution)

SLIDE 38

Draw 10 random students, take the average, plot it: Do this 5 & 10 times.

Inadequate sample size No clear distribution around population mean

2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 5 Samples

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 10 Samples

SLIDE 39

More sample means around population mean Still spread a good deal

Draw 10 random students: 50 and 100 times

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 50 Samples

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means with 100 Samples

SLIDE 40

Distribution now significantly more normal Starting to see peaks

Draws 10 random students: 500 and 1000 times

10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 500 Samples

10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 1000 Samples

SLIDE 41

Draw 10 Random students

This is like a sample size of 10
What happens if we take a sample size of 50?

SLIDE 42

What happens to the sampling distribution if we draw a sample size

f 50 instead of 10, and take the mean (thousands of times)?
A. We will approach a bell

curve faster (than with a sample size of 10)

B. The bell curve will be

narrower

C. Both A & B
D. Neither. The

underlying sampling distribution does not change.

A. B. C. D.

0% 0% 0% 0%

SLIDE 43

N = 10 N = 50

2 4 6 8 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 5 Samples

1 2 3 4 5 6 7 8 9 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 10 Samples

2 4 6 8 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 5 Samples

1 2 3 4 5 6 7 8 9 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 10 Samples

SLIDE 44

N = 10 N = 50

2 4 6 8 10 12 14 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 50 Samples

5 10 15 20 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means with 100 Samples

2 4 6 8 10 12 14 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 50 Samples

5 10 15 20 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 100 Samples

SLIDE 45

N = 10 N = 50

10

10 30 50 70 90 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 500 Samples

40

10 60 110 160 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 1000 Samples

20 40 60 80 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 500 Samples

50 100 150 200 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 1000 Samples

SLIDE 46

Outline

Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

Detecting impact

SLIDE 47

Population & sampling distribution: Draw 1 random student (from 8,000)

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)

SLIDE 48

Sampling Distribution: Draw 4 random students (N=4)

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=4)

SLIDE 49

Law of Large Numbers : N=9

26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=9)

SLIDE 50

Law of Large Numbers: N =100

26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=100)

SLIDE 51

Central Limit Theorem: N=1

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_1 freq (N=1)

The white line is a theoretical distribution

SLIDE 52

Central Limit Theorem : N=4

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_4 freq (N=4)

SLIDE 53

Central Limit Theorem : N=9

26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_9 freq (N=9)

SLIDE 54

Central Limit Theorem : N =100

26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_100 freq (N=100)

SLIDE 55

So Why Do We Care?

Sampling distribution is a probability distribution
Sampling Distribution is a bell curve (irrespective of

what the underlying distribution is)

Why does it matter?
Why do we care if the probability distribution looks

like a bell curve?

Because we know how to calculate the area

underneath!

SLIDE 56

95% Confidence Interval

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6

1.96 SD 1.96 SD

SLIDE 57

Outline

Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

Detecting impact

SLIDE 58

Standard deviation/error

But wait! The regression results that I have seen

typically report the standard error, not the standard deviation.

What’s the difference between the standard

deviation and the standard error?

The standard error = the standard deviation

f the sampling distribution

SLIDE 59

Variance = 400

𝜏2 = 𝑃𝑐𝑡𝑓𝑠𝑤𝑏𝑢𝑗𝑝𝑜 𝑊𝑏𝑚𝑣𝑓 − 𝐵𝑤𝑓𝑠𝑏𝑕𝑓 2 𝑂

Standard Deviation = 20

𝜏 = 𝑊𝑏𝑠𝑗𝑏𝑜𝑑𝑓

Standard Error = 20

𝑂

SE = 𝜏

𝑂

Variance and Standard Deviation

SLIDE 60

Standard Deviation/ Standard Error

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd dist_1

SLIDE 61

Sample size ↑ x4, SE ↓ ½

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se4 dist_4

SLIDE 62

Sample size ↑ x9, SE ↓ ?

26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se9 dist_9

SLIDE 63

Sample size ↑ x100, SE ↓?

26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se100 dist_100

SLIDE 64

Outline

Sampling distributions
Detecting impact

– significance – effect size – power – baseline and covariates – clustering – stratification

SLIDE 65

Baseline test scores

50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores

SLIDE 66

We implement the Balsakhi Program

SLIDE 67

Endline test scores

20 40 60 80 100 120 140 160 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores

After the balsakhi programs, these are the endline test scores

SLIDE 68

20 40 60 80 100 120 140 160

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Endline test scores

The impact appears to be?

A. Positive
B. Negative
C. No impact
D. Don’t know

100 200 300 400 500

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Baseline test scores A. B. C. D.

0% 0% 0% 0%

SLIDE 69

Post-test: control & treatment

20 40 60 80 100 120 140 160

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

test scores control treatment

Stop! That was the control group. The treatment group is red.

SLIDE 70

Is this impact statistically significant?

A. Yes
B. No
C. Don’t know

20 40 60 80 100 120 140 160 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

test scores

control treatment control μ treatment μ

Average Difference = 6 points

A. B. C.

0% 0% 0%

SLIDE 71

One experiment: 6 points

SLIDE 72

One experiment

SLIDE 73

Two experiments

SLIDE 74

A few more…

SLIDE 75

A few more…

SLIDE 76

Many more…

SLIDE 77

A whole lot more…

SLIDE 78

…

SLIDE 79

Running the experiment thousands of times…

By the Central Limit Theorem, these are normally distributed

SLIDE 80

The assumption about your sample

The Central Limit Theorem and the Law

f Large Numbers hold if the sample is

randomly sampled from your population

SLIDE 81

Theoretical Sampling distribution

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6

SLIDE 82

So let’s look at hypothesis testing

In criminal law, most institutions follow the rule: “innocent

until proven guilty”

In program evaluation, instead of “presumption of innocence,”

the rule is: “presumption of insignificance”

The “Null hypothesis” (H0) is that there was no (zero) impact of

the program

The burden of proof is on the evaluator to show a significant

difference – Think about how this relates to the discussion of ethics on Sunday.

SLIDE 83

Hypothesis testing: conclusions

If it is very unlikely (less than a 5%

probability) that the difference is solely due to chance:

– We “reject our null hypothesis”

We may now say:

– “our program has a statistically significant impact”

SLIDE 84

Hypothesis Testing: Steps

1. Determine the (size of the) sampling distribution around the null hypothesis H0 by calculating the standard error 2. Choose the confidence interval, e.g. 95% (or significance level: α) (α=5%) 3. Identify the critical value (boundary of the confidence interval) 4. If our observation falls in the critical region we can reject the null hypothesis

SLIDE 85

Remember our 95% Confidence Interval?

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control

H0

1.96 SD 1.96 SD

SLIDE 86

Impose significance level of 5%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control

H0

1.96 SD

H0 H0

SLIDE 87

What is the significance level?

Type I error: rejecting the null hypothesis even

though it is true (false positive)

Significance level: The probability that we will

reject the null hypothesis even though it is true

SLIDE 88

What is Power?

Type II Error: Failing to reject the null

hypothesis (concluding there is no difference), when indeed the null hypothesis is false.

Power: If there is a measureable effect of our

intervention (the null hypothesis is false), the probability that we will detect an effect (reject the null hypothesis)

SLIDE 89

Hypothesis testing: 95% confidence

YOU CONCLUDE CLUDE Effective No Effect THE TRUTH TH Effective



Type e II Error

r

(low power)



No Effect Type e I Error (5% of the time)





SLIDE 90

Before the experiment

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment

Assume two effects: no effect and treatment effect β

H0 Hβ

SLIDE 91

Impose significance level of 5%

Anything between lines cannot be distinguished from 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

Hβ H0

Type I Error

SLIDE 92

Can we distinguish Hβ from H0 ?

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

Shaded area shows % of time we would find Hβ true if it was

Hβ H0

Type II Error

SLIDE 93

What influences power?

What are the factors that change the proportion
f the research hypothesis that is shaded—i.e.

the proportion that falls to the right (or left) of the null hypothesis curve?

Understanding this helps us design more

powerful experiments.

SLIDE 94

Power: main ingredients

1. Sample Size (N)
2. Effect Size (δ)
3. Variance (σ)
4. Proportion of sample in T vs. C
5. Clustering (ρ)
6. Non-Compliance (akin to δ↓)

SLIDE 95

Power: main ingredients

1. Sample Size (N)
2. Effect Size (δ)
3. Variance (σ)
4. Proportion of sample in T vs. C
5. Clustering (ρ)
6. Non-Compliance (akin to δ↓)

SLIDE 96

By increasing sample size you increase…

A. Accuracy
B. Precision
C. Both
D. Neither
E. Don’t know

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

A. B. C. D. E.

0% 0% 0% 0% 0%

SLIDE 97

Power: Effect size = 1SE, Sample size = N

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

Remember, your sampling distribution becomes narrower as N↑

SLIDE 98

Power: Sample size = 4N

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

SLIDE 99

Power: 64%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

SLIDE 100

Power: Sample size = 9N

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

SLIDE 101

Power: 91%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

SLIDE 102

Power: main ingredients

1. Sample Size (N)
2. Effect Size (δ)
3. Variance (σ)
4. Proportion of sample in T vs. C
5. Clustering (ρ)
6. Non-Compliance (akin to δ↓)

SLIDE 103

Effect size = 1*SE

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

1 SE

H0 Hβ

SLIDE 104

The Null Hypothesis would be rejected only 26% of the time

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

Power: 26% If the true impact was 1*SE… Hβ H0

Effect size = 1*SE: Power = 26%

SLIDE 105

Bigger hypothesized effect size distributions farther apart

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment

H H

β

Effect size = 3*SE

3*SE

SLIDE 106

Bigger Effect size means more power

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

Effect size = 3*SE: Power = 91%

H0 Hβ

SLIDE 107

What effect size should you use when designing your experiment?

A. Smallest effect size

that is still cost effective

B. Largest effect size

you expect your program to produce

C. Both
D. Neither

A. B. C. D.

0% 0% 0% 0%

SLIDE 108

Effect size

What effect size should we pick while calculating

the optimal sample size, assuming no other constraints?

Ideally, we design our experiment to detect the

smallest effect size that is still interesting.

– Interesting, as long as the value of that answer is worth the cost of the evaluation.

This is where “substantive significance” matters.

SLIDE 109

Power: main ingredients

1. Sample Size (N)
2. Effect Size (δ)
3. Variance (σ)
4. Proportion of sample in T vs. C
5. Clustering (ρ)
6. Non-Compliance (akin to δ↓)

SLIDE 110

Variance

There is sometimes very little we can do to reduce the noise
The underlying variance is what it is- just a characteristic of

the population at hand!

We can try to “absorb” variance:

– using a baseline – controlling for other variables

In practice, controlling for other variables (besides the baseline
utcome) buys you very little

SLIDE 111

Power: main ingredients

1. Sample Size (N)
2. Effect Size (δ)
3. Variance (σ)
4. Proportion of sample in T vs. C
5. Clustering (ρ)
6. Non-Compliance (akin to δ↓)

SLIDE 112

Clustered design: intuition

You want to know how close the upcoming state

elections will be

Method 1: Randomly select 50 people from entire

state (N=50)

Method 2: Randomly select 5 families in the state,

and ask ten members of each family their opinion (N=50)

SLIDE 113

HIGH intra-cluster correlation (ICC) aka ρ (rho)

SLIDE 114

LOW intra-cluster correlation (ICC) aka ρ (rho)

SLIDE 115

All uneducated people live in one village. People with only primary education live in another. College grads live in a third, etc. ICC (ρ) on education will be..

A. High
B. Low
C. No effect on rho
D. Don’t know

A. B. C. D.

0% 0% 0% 0%

SLIDE 116

If ICC (ρ) is high, what is a more efficient way of increasing power?

A. Include more

clusters in the sample

B. Include more

people in clusters

C. Both
D. Don’t know

A. B. C. D.

0% 0% 0% 0%

SLIDE 117

BONUS SLIDES (TIME PERMITTING…)

SLIDE 118

Testing multiple treatments

Control Group Balsakhi CAL program Balsakhi + CAL ←0.15 SD→ ↖ 0.25 SD ↘ ↑ 0.15 SD ↓ ↑ 0.10 SD ↓ ←0.10 SD→ ↗ 0.05 SD ↙ 50 50 50 100 100 100 200 200

SLIDE 119

Power: main ingredients

1. Sample Size (N)
2. Effect Size (δ)
3. Variance (σ)
4. Proportion of sample in T vs. C
5. Clustering (ρ)
6. Non-Compliance (akin to δ↓)

SLIDE 120

Power!

 

) 1 ( 1 * 1 1 *

2 1

    



m N P P t t EffectSize  

  Effect Size Variance Sample Size Significance Level Power Proportion in Treatment ICC Average Cluster Size

SLIDE 121

Power!

 

) 1 ( 1 * 1 1 *

2 1

    



m N P P t t EffectSize  

  Proportion in Treatment

SLIDE 122

Sample split: 50% C, 50% T

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

H0 Hβ

SLIDE 123

Power: 91%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

SLIDE 124

If it’s not 50-50 split?

What happens to the relative fatness if the

split is not 50-50?

Say 25-75?

SLIDE 125

Sample split: 25% C, 75% T

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment significance

H0 Hβ

SLIDE 126

Power: 83%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

4
3
2
1

1 2 3 4 5 6 control treatment power

SLIDE 127