Sampling and Sample Size
Rohit Naimpally J-PAL
Sampling and Sample Size Rohit Naimpally J-PAL Course Overview 1. - - PowerPoint PPT Presentation
Sampling and Sample Size Rohit Naimpally J-PAL Course Overview 1. What is Evaluation? 2. Outcomes, Impact, and Indicators 3. Why Randomize and Common Critiques 4. How to Randomize 5. Sampling and Sample Size 6. Threats and Analysis 7.
Rohit Naimpally J-PAL
“Trevor was a painter. Indeed, few people escape that nowadays. But he was also an artist, and artists are rather rare.”
“Power is as much an art as a science.”
At the end of the presentation, you will: 1. Know the Central Limit Theorem and the Law of Large Numbers, and why they matter. 2. Know the difference between a Type I and a Type II error. 3. Know what the “power” of a study is and why you should care. 4. Be ready to tackle the power exercise in the next session!
A. B. C.
0% 0% 0%
truth estimates
Precision (Sample Size)
Accuracy (Randomization)
“expected result”? (i.e. the average)
1/6
1 2 3 4 5 6
1/6
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
Frequency Average
expected average of the two dice?
2 3 4 5 6 7 8 9 10 11 12 Frequency 1/36 1/18 1/12 1/9 5/36 1/6 5/36 1/9 1/12 1/18 1/36 1/6 1/4
Die 1 Die 2 2 3 4 5 6 7 3 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Frequency 1/36 1/18 1/12 1/9 5/36 1/6 5/36 1/9 1/12 1/18 1/36 1/6 1/4
Likelihood
Each block within a column represents one possible permutation (to obtain that average)
Each column represents one possible outcome (average result)
2.5
1 1 1/3 1 2/3 2 2 1/3 2 2/3 3 3 1/3 3 2/3 4 4 1/3 4 2/3 5 5 1/3 5 2/3 6 Frequency 0% 1% 3% 5% 7% 10% 12% 13% 13% 12% 10% 7% 5% 3% 1% 0% 1/6
0% 2% 4% 6% 8% 10% 12% 14% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
0% 2% 4% 6% 8% 10% 12% 1 2 3 4 5 6
0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
Looks like a bell curve, or a normal distribution
0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
>95% of all rolls will yield an average between 3 and 4
0.0% 0.5% 1.0% 1.5% 2.0% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
>99% of all rolls will yield an average between 3 and 4
1. The more dice you roll, the closer most averages are to the true average (the distribution gets “tighter”)
2. The more dice you roll, the more the distribution of possible averages (the sampling distribution) looks like a bell curve (a normal distribution)
truth estimates truth truth
Precision (Sample Size)
Accuracy (Randomization)
– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error
– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error
50 100 150 200 250 300 350 400 450 500 7 14 21 28 35 42 49 56 63 70 77 84 91 98 test scores frequency
26 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency mean
26
100 200 300 400 500 600 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency sd mean
1 Standard Deviation
26
0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)
Good, the average of the sample is about 26…
sampling distribution supposed to turn into a bell curve?
One limitation of statistical theory is that it assumes the population distribution is normally distributed
A. B. C. D.
0% 0% 0% 0%
The sampling distribution may not be normal if the population distribution is skewed
A. B. C. D.
0% 0% 0% 0%
26
0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)
This is the distribution of my sample of 8,000 students!
– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error
200 400 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores
This is the distribution of the population (Population Distribution)
This is the distribution of Means from all Random Samples (Sampling distribution)
Inadequate sample size No clear distribution around population mean
2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Frequency of Means With 5 Samples
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Frequency of Means With 10 Samples
More sample means around population mean Still spread a good deal
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Frequency of Means With 50 Samples
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Frequency of Means with 100 Samples
Distribution now significantly more normal Starting to see peaks
10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Frequency of Means With 500 Samples
10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Frequency of Means With 1000 Samples
What happens to the sampling distribution if we draw a sample size
curve faster (than with a sample size of 10)
narrower
underlying sampling distribution does not change.
A. B. C. D.
0% 0% 0% 0%
2 4 6 8 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 5 Samples
1 2 3 4 5 6 7 8 9 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 10 Samples
2 4 6 8 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 5 Samples
1 2 3 4 5 6 7 8 9 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 10 Samples
2 4 6 8 10 12 14 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 50 Samples
5 10 15 20 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means with 100 Samples
2 4 6 8 10 12 14 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 50 Samples
5 10 15 20 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 100 Samples
10 30 50 70 90 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 500 Samples
10 60 110 160 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 1000 Samples
20 40 60 80 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 500 Samples
50 100 150 200 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Frequency of Means With 1000 Samples
– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error
26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)
26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=4)
26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=9)
26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=100)
26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_1 freq (N=1)
The white line is a theoretical distribution
26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_4 freq (N=4)
26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_9 freq (N=9)
26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_100 freq (N=100)
what the underlying distribution is)
like a bell curve?
underneath!
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6
1.96 SD 1.96 SD
– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error
typically report the standard error, not the standard deviation.
deviation and the standard error?
𝜏2 = 𝑃𝑐𝑡𝑓𝑠𝑤𝑏𝑢𝑗𝑝𝑜 𝑊𝑏𝑚𝑣𝑓 − 𝐵𝑤𝑓𝑠𝑏𝑓 2 𝑂
𝜏 = 𝑊𝑏𝑠𝑗𝑏𝑜𝑑𝑓
𝑂
SE = 𝜏
𝑂
26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd dist_1
26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se4 dist_4
26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se9 dist_9
26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se100 dist_100
– significance – effect size – power – baseline and covariates – clustering – stratification
50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores
20 40 60 80 100 120 140 160 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores
After the balsakhi programs, these are the endline test scores
20 40 60 80 100 120 140 160
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Endline test scores
100 200 300 400 500
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Baseline test scores A. B. C. D.
0% 0% 0% 0%
20 40 60 80 100 120 140 160
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
test scores control treatment
Stop! That was the control group. The treatment group is red.
20 40 60 80 100 120 140 160 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
test scores
control treatment control μ treatment μ
Average Difference = 6 points
A. B. C.
0% 0% 0%
By the Central Limit Theorem, these are normally distributed
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6
until proven guilty”
the rule is: “presumption of insignificance”
the program
difference – Think about how this relates to the discussion of ethics on Sunday.
– We “reject our null hypothesis”
– “our program has a statistically significant impact”
1. Determine the (size of the) sampling distribution around the null hypothesis H0 by calculating the standard error 2. Choose the confidence interval, e.g. 95% (or significance level: α) (α=5%) 3. Identify the critical value (boundary of the confidence interval) 4. If our observation falls in the critical region we can reject the null hypothesis
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control
1.96 SD 1.96 SD
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control
1.96 SD
though it is true (false positive)
reject the null hypothesis even though it is true
hypothesis (concluding there is no difference), when indeed the null hypothesis is false.
intervention (the null hypothesis is false), the probability that we will detect an effect (reject the null hypothesis)
YOU CONCLUDE CLUDE Effective No Effect THE TRUTH TH Effective
Type e II Error
(low power)
No Effect Type e I Error (5% of the time)
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment
Assume two effects: no effect and treatment effect β
Anything between lines cannot be distinguished from 0
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment significance
Type I Error
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment power
Shaded area shows % of time we would find Hβ true if it was
Type II Error
the proportion that falls to the right (or left) of the null hypothesis curve?
powerful experiments.
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment power
A. B. C. D. E.
0% 0% 0% 0% 0%
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment significance
Remember, your sampling distribution becomes narrower as N↑
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment significance
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment power
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment significance
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment power
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment significance
1 SE
The Null Hypothesis would be rejected only 26% of the time
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment power
Bigger hypothesized effect size distributions farther apart
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment
3*SE
Bigger Effect size means more power
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment power
that is still cost effective
you expect your program to produce
A. B. C. D.
0% 0% 0% 0%
the optimal sample size, assuming no other constraints?
smallest effect size that is still interesting.
– Interesting, as long as the value of that answer is worth the cost of the evaluation.
the population at hand!
– using a baseline – controlling for other variables
elections will be
state (N=50)
and ask ten members of each family their opinion (N=50)
All uneducated people live in one village. People with only primary education live in another. College grads live in a third, etc. ICC (ρ) on education will be..
A. B. C. D.
0% 0% 0% 0%
clusters in the sample
people in clusters
A. B. C. D.
0% 0% 0% 0%
Control Group Balsakhi CAL program Balsakhi + CAL ←0.15 SD→ ↖ 0.25 SD ↘ ↑ 0.15 SD ↓ ↑ 0.10 SD ↓ ←0.10 SD→ ↗ 0.05 SD ↙ 50 50 50 100 100 100 200 200
) 1 ( 1 * 1 1 *
2 1
m N P P t t EffectSize
Effect Size Variance Sample Size Significance Level Power Proportion in Treatment ICC Average Cluster Size
) 1 ( 1 * 1 1 *
2 1
m N P P t t EffectSize
Proportion in Treatment
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment significance
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment power
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment significance
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 2 3 4 5 6 control treatment power