Sampling and Sample Size Rohit Naimpally J-PAL Course Overview 1. - - PowerPoint PPT Presentation

sampling and sample size
SMART_READER_LITE
LIVE PREVIEW

Sampling and Sample Size Rohit Naimpally J-PAL Course Overview 1. - - PowerPoint PPT Presentation

Sampling and Sample Size Rohit Naimpally J-PAL Course Overview 1. What is Evaluation? 2. Outcomes, Impact, and Indicators 3. Why Randomize and Common Critiques 4. How to Randomize 5. Sampling and Sample Size 6. Threats and Analysis 7.


slide-1
SLIDE 1

Sampling and Sample Size

Rohit Naimpally J-PAL

slide-2
SLIDE 2

Course Overview

  • 1. What is Evaluation?
  • 2. Outcomes, Impact, and Indicators
  • 3. Why Randomize and Common Critiques
  • 4. How to Randomize
  • 5. Sampling and Sample Size
  • 6. Threats and Analysis
  • 7. Project from Start to Finish
  • 8. Cost-Effectiveness Analysis and Scaling Up
slide-3
SLIDE 3

Framing the discussion…

“Trevor was a painter. Indeed, few people escape that nowadays. But he was also an artist, and artists are rather rare.”

  • Oscar Wilde

“Power is as much an art as a science.”

  • Unknown (probably not Oscar Wilde)
slide-4
SLIDE 4

Learning Objectives

At the end of the presentation, you will: 1. Know the Central Limit Theorem and the Law of Large Numbers, and why they matter. 2. Know the difference between a Type I and a Type II error. 3. Know what the “power” of a study is and why you should care. 4. Be ready to tackle the power exercise in the next session!

slide-5
SLIDE 5

THE basic questions in statistics

  • How confident can you be in your results?

–This is given by the significance level of your results (remember the “asterisks”?)

  • How big does your sample need to be?

–This is given by the power of your design.

slide-6
SLIDE 6

Recap: Which of these is more accurate?

  • A. I.
  • B. II.
  • C. Don’t know

I. II.

A. B. C.

0% 0% 0%

slide-7
SLIDE 7

Recap: Accuracy versus Precision

truth estimates

Precision (Sample Size)

Accuracy (Randomization)

slide-8
SLIDE 8

What’s the average result?

  • If you were to roll a die once, what is the

“expected result”? (i.e. the average)

slide-9
SLIDE 9

Possible results & probability: 1 die

1/6

1 2 3 4 5 6

slide-10
SLIDE 10

Rolling 1 die: possible results & average

1/6

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

Frequency Average

slide-11
SLIDE 11

What’s the average result?

  • If you were to roll two dice once, what is the

expected average of the two dice?

slide-12
SLIDE 12

Rolling 2 dice: Possible totals & likelihood

2 3 4 5 6 7 8 9 10 11 12 Frequency 1/36 1/18 1/12 1/9 5/36 1/6 5/36 1/9 1/12 1/18 1/36 1/6 1/4

slide-13
SLIDE 13

Rolling 2 dice: possible totals 12 possible totals, 36 permutations

Die 1 Die 2 2 3 4 5 6 7 3 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12

slide-14
SLIDE 14

Rolling 2 dice: Average score of dice & likelihood

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Frequency 1/36 1/18 1/12 1/9 5/36 1/6 5/36 1/9 1/12 1/18 1/36 1/6 1/4

Likelihood

slide-15
SLIDE 15

Outcomes and Permutations

Putting together permutations, you get:

  • 1. All possible outcomes
  • 2. The likelihood of each of those outcomes

Each block within a column represents one possible permutation (to obtain that average)

Each column represents one possible outcome (average result)

2.5

slide-16
SLIDE 16

Rolling 3 dice: 16 results 318, 216 permutations

1 1 1/3 1 2/3 2 2 1/3 2 2/3 3 3 1/3 3 2/3 4 4 1/3 4 2/3 5 5 1/3 5 2/3 6 Frequency 0% 1% 3% 5% 7% 10% 12% 13% 13% 12% 10% 7% 5% 3% 1% 0% 1/6

slide-17
SLIDE 17

Rolling 4 dice: 21 results, 1296 permutations

0% 2% 4% 6% 8% 10% 12% 14% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

slide-18
SLIDE 18

Rolling 5 dice: 26 results, 7776 permutations

0% 2% 4% 6% 8% 10% 12% 1 2 3 4 5 6

slide-19
SLIDE 19

Rolling 10 dice: 50 results, >60 million permutations

0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

Looks like a bell curve, or a normal distribution

slide-20
SLIDE 20

Rolling 30 dice: 150 results, 2 x 10 23 permutations

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

>95% of all rolls will yield an average between 3 and 4

slide-21
SLIDE 21

Rolling 100 dice: 500 results, 6 x 10 77 permutations

0.0% 0.5% 1.0% 1.5% 2.0% 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

>99% of all rolls will yield an average between 3 and 4

slide-22
SLIDE 22

Rolling dice: 2 lessons

1. The more dice you roll, the closer most averages are to the true average (the distribution gets “tighter”)

  • THE LAW OF LARGE NUMBERS-

2. The more dice you roll, the more the distribution of possible averages (the sampling distribution) looks like a bell curve (a normal distribution)

  • THE CENTRAL LIMIT THEOREM-
slide-23
SLIDE 23

Accuracy versus Precision

truth estimates truth truth

Precision (Sample Size)

Accuracy (Randomization)

slide-24
SLIDE 24

THAT WAS JUST THE INTRODUCTION

slide-25
SLIDE 25

Outline

  • Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

  • Detecting impact
slide-26
SLIDE 26

Outline

  • Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

  • Detecting impact
slide-27
SLIDE 27

Baseline test scores

50 100 150 200 250 300 350 400 450 500 7 14 21 28 35 42 49 56 63 70 77 84 91 98 test scores frequency

slide-28
SLIDE 28

Mean = 26

26 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency mean

slide-29
SLIDE 29

Standard Deviation = 20

26

100 200 300 400 500 600 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency sd mean

1 Standard Deviation

slide-30
SLIDE 30

Let’s do an experiment

  • Take 1 Random test score from the pile of 16,000 tests
  • Write down the value
  • Put the test back
  • Do these three steps again
  • And again
  • 8,000 times
  • This is like a random sample of 8,000 (with replacement)
slide-31
SLIDE 31

What can we say about this sample?

26

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)

Good, the average of the sample is about 26…

slide-32
SLIDE 32

But…

  • I remember that as my sample goes, up, isn’t the

sampling distribution supposed to turn into a bell curve?

  • …(Central Limit Theorem)
  • Is it that my sample isn’t large enough?
slide-33
SLIDE 33

One limitation of statistical theory is that it assumes the population distribution is normally distributed

  • A. True
  • B. False
  • C. Depends
  • D. Don’t know

A. B. C. D.

0% 0% 0% 0%

slide-34
SLIDE 34

The sampling distribution may not be normal if the population distribution is skewed

  • A. True
  • B. False
  • C. Depends
  • D. Don’t know

A. B. C. D.

0% 0% 0% 0%

slide-35
SLIDE 35

Population vs. sampling distribution

26

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)

This is the distribution of my sample of 8,000 students!

slide-36
SLIDE 36

Outline

  • Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

  • Detecting impact
slide-37
SLIDE 37

How do we get from here…

200 400 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores

To here…

This is the distribution of the population (Population Distribution)

This is the distribution of Means from all Random Samples (Sampling distribution)

slide-38
SLIDE 38

Draw 10 random students, take the average, plot it: Do this 5 & 10 times.

Inadequate sample size No clear distribution around population mean

2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 5 Samples

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 10 Samples

slide-39
SLIDE 39

More sample means around population mean Still spread a good deal

Draw 10 random students: 50 and 100 times

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 50 Samples

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means with 100 Samples

slide-40
SLIDE 40

Distribution now significantly more normal Starting to see peaks

Draws 10 random students: 500 and 1000 times

10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 500 Samples

10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Frequency of Means With 1000 Samples

slide-41
SLIDE 41

Draw 10 Random students

  • This is like a sample size of 10
  • What happens if we take a sample size of 50?
slide-42
SLIDE 42

What happens to the sampling distribution if we draw a sample size

  • f 50 instead of 10, and take the mean (thousands of times)?
  • A. We will approach a bell

curve faster (than with a sample size of 10)

  • B. The bell curve will be

narrower

  • C. Both A & B
  • D. Neither. The

underlying sampling distribution does not change.

A. B. C. D.

0% 0% 0% 0%

slide-43
SLIDE 43

N = 10 N = 50

2 4 6 8 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 5 Samples

1 2 3 4 5 6 7 8 9 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 10 Samples

2 4 6 8 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 5 Samples

1 2 3 4 5 6 7 8 9 10 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 10 Samples

slide-44
SLIDE 44

N = 10 N = 50

2 4 6 8 10 12 14 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 50 Samples

5 10 15 20 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means with 100 Samples

2 4 6 8 10 12 14 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 50 Samples

5 10 15 20 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 100 Samples

slide-45
SLIDE 45

N = 10 N = 50

  • 10

10 30 50 70 90 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 500 Samples

  • 40

10 60 110 160 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 1000 Samples

20 40 60 80 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 500 Samples

50 100 150 200 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Frequency of Means With 1000 Samples

slide-46
SLIDE 46

Outline

  • Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

  • Detecting impact
slide-47
SLIDE 47

Population & sampling distribution: Draw 1 random student (from 8,000)

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=1)

slide-48
SLIDE 48

Sampling Distribution: Draw 4 random students (N=4)

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=4)

slide-49
SLIDE 49

Law of Large Numbers : N=9

26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=9)

slide-50
SLIDE 50

Law of Large Numbers: N =100

26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency freq (N=100)

slide-51
SLIDE 51

Central Limit Theorem: N=1

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_1 freq (N=1)

The white line is a theoretical distribution

slide-52
SLIDE 52

Central Limit Theorem : N=4

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_4 freq (N=4)

slide-53
SLIDE 53

Central Limit Theorem : N=9

26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_9 freq (N=9)

slide-54
SLIDE 54

Central Limit Theorem : N =100

26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency dist_100 freq (N=100)

slide-55
SLIDE 55

So Why Do We Care?

  • Sampling distribution is a probability distribution
  • Sampling Distribution is a bell curve (irrespective of

what the underlying distribution is)

  • Why does it matter?
  • Why do we care if the probability distribution looks

like a bell curve?

  • Because we know how to calculate the area

underneath!

slide-56
SLIDE 56

95% Confidence Interval

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

1.96 SD 1.96 SD

slide-57
SLIDE 57

Outline

  • Sampling distributions

– population distribution – sampling distribution – law of large numbers/central limit theorem – standard deviation and standard error

  • Detecting impact
slide-58
SLIDE 58

Standard deviation/error

  • But wait! The regression results that I have seen

typically report the standard error, not the standard deviation.

  • What’s the difference between the standard

deviation and the standard error?

The standard error = the standard deviation

  • f the sampling distribution
slide-59
SLIDE 59
  • Variance = 400

𝜏2 = 𝑃𝑐𝑡𝑓𝑠𝑤𝑏𝑢𝑗𝑝𝑜 𝑊𝑏𝑚𝑣𝑓 − 𝐵𝑤𝑓𝑠𝑏𝑕𝑓 2 𝑂

  • Standard Deviation = 20

𝜏 = 𝑊𝑏𝑠𝑗𝑏𝑜𝑑𝑓

  • Standard Error = 20

𝑂

SE = 𝜏

𝑂

Variance and Standard Deviation

slide-60
SLIDE 60

Standard Deviation/ Standard Error

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd dist_1

slide-61
SLIDE 61

Sample size ↑ x4, SE ↓ ½

26 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se4 dist_4

slide-62
SLIDE 62

Sample size ↑ x9, SE ↓ ?

26 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se9 dist_9

slide-63
SLIDE 63

Sample size ↑ x100, SE ↓?

26 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores mean frequency sd se100 dist_100

slide-64
SLIDE 64

Outline

  • Sampling distributions
  • Detecting impact

– significance – effect size – power – baseline and covariates – clustering – stratification

slide-65
SLIDE 65

Baseline test scores

50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores

slide-66
SLIDE 66

We implement the Balsakhi Program

slide-67
SLIDE 67

Endline test scores

20 40 60 80 100 120 140 160 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores

After the balsakhi programs, these are the endline test scores

slide-68
SLIDE 68

20 40 60 80 100 120 140 160

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Endline test scores

The impact appears to be?

  • A. Positive
  • B. Negative
  • C. No impact
  • D. Don’t know

100 200 300 400 500

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Baseline test scores A. B. C. D.

0% 0% 0% 0%

slide-69
SLIDE 69

Post-test: control & treatment

20 40 60 80 100 120 140 160

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

test scores control treatment

Stop! That was the control group. The treatment group is red.

slide-70
SLIDE 70

Is this impact statistically significant?

  • A. Yes
  • B. No
  • C. Don’t know

20 40 60 80 100 120 140 160 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

test scores

control treatment control μ treatment μ

Average Difference = 6 points

A. B. C.

0% 0% 0%

slide-71
SLIDE 71

One experiment: 6 points

slide-72
SLIDE 72

One experiment

slide-73
SLIDE 73

Two experiments

slide-74
SLIDE 74

A few more…

slide-75
SLIDE 75

A few more…

slide-76
SLIDE 76

Many more…

slide-77
SLIDE 77

A whole lot more…

slide-78
SLIDE 78

slide-79
SLIDE 79

Running the experiment thousands of times…

By the Central Limit Theorem, these are normally distributed

slide-80
SLIDE 80

The assumption about your sample

The Central Limit Theorem and the Law

  • f Large Numbers hold if the sample is

randomly sampled from your population

slide-81
SLIDE 81

Theoretical Sampling distribution

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

slide-82
SLIDE 82

So let’s look at hypothesis testing

  • In criminal law, most institutions follow the rule: “innocent

until proven guilty”

  • In program evaluation, instead of “presumption of innocence,”

the rule is: “presumption of insignificance”

  • The “Null hypothesis” (H0) is that there was no (zero) impact of

the program

  • The burden of proof is on the evaluator to show a significant

difference – Think about how this relates to the discussion of ethics on Sunday.

slide-83
SLIDE 83

Hypothesis testing: conclusions

  • If it is very unlikely (less than a 5%

probability) that the difference is solely due to chance:

– We “reject our null hypothesis”

  • We may now say:

– “our program has a statistically significant impact”

slide-84
SLIDE 84

Hypothesis Testing: Steps

1. Determine the (size of the) sampling distribution around the null hypothesis H0 by calculating the standard error 2. Choose the confidence interval, e.g. 95% (or significance level: α) (α=5%) 3. Identify the critical value (boundary of the confidence interval) 4. If our observation falls in the critical region we can reject the null hypothesis

slide-85
SLIDE 85

Remember our 95% Confidence Interval?

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control

H0

1.96 SD 1.96 SD

slide-86
SLIDE 86

Impose significance level of 5%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control

H0

1.96 SD

H0 H0

slide-87
SLIDE 87

What is the significance level?

  • Type I error: rejecting the null hypothesis even

though it is true (false positive)

  • Significance level: The probability that we will

reject the null hypothesis even though it is true

slide-88
SLIDE 88

What is Power?

  • Type II Error: Failing to reject the null

hypothesis (concluding there is no difference), when indeed the null hypothesis is false.

  • Power: If there is a measureable effect of our

intervention (the null hypothesis is false), the probability that we will detect an effect (reject the null hypothesis)

slide-89
SLIDE 89

Hypothesis testing: 95% confidence

YOU CONCLUDE CLUDE Effective No Effect THE TRUTH TH Effective

Type e II Error

  • r

(low power)

No Effect Type e I Error (5% of the time)

slide-90
SLIDE 90

Before the experiment

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment

Assume two effects: no effect and treatment effect β

H0 Hβ

slide-91
SLIDE 91

Impose significance level of 5%

Anything between lines cannot be distinguished from 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

Hβ H0

Type I Error

slide-92
SLIDE 92

Can we distinguish Hβ from H0 ?

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

Shaded area shows % of time we would find Hβ true if it was

Hβ H0

Type II Error

slide-93
SLIDE 93

What influences power?

  • What are the factors that change the proportion
  • f the research hypothesis that is shaded—i.e.

the proportion that falls to the right (or left) of the null hypothesis curve?

  • Understanding this helps us design more

powerful experiments.

slide-94
SLIDE 94

Power: main ingredients

  • 1. Sample Size (N)
  • 2. Effect Size (δ)
  • 3. Variance (σ)
  • 4. Proportion of sample in T vs. C
  • 5. Clustering (ρ)
  • 6. Non-Compliance (akin to δ↓)
slide-95
SLIDE 95

Power: main ingredients

  • 1. Sample Size (N)
  • 2. Effect Size (δ)
  • 3. Variance (σ)
  • 4. Proportion of sample in T vs. C
  • 5. Clustering (ρ)
  • 6. Non-Compliance (akin to δ↓)
slide-96
SLIDE 96

By increasing sample size you increase…

  • A. Accuracy
  • B. Precision
  • C. Both
  • D. Neither
  • E. Don’t know

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

A. B. C. D. E.

0% 0% 0% 0% 0%

slide-97
SLIDE 97

Power: Effect size = 1SE, Sample size = N

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

Remember, your sampling distribution becomes narrower as N↑

slide-98
SLIDE 98

Power: Sample size = 4N

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

slide-99
SLIDE 99

Power: 64%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

slide-100
SLIDE 100

Power: Sample size = 9N

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

slide-101
SLIDE 101

Power: 91%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

slide-102
SLIDE 102

Power: main ingredients

  • 1. Sample Size (N)
  • 2. Effect Size (δ)
  • 3. Variance (σ)
  • 4. Proportion of sample in T vs. C
  • 5. Clustering (ρ)
  • 6. Non-Compliance (akin to δ↓)
slide-103
SLIDE 103

Effect size = 1*SE

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

1 SE

H0 Hβ

slide-104
SLIDE 104

The Null Hypothesis would be rejected only 26% of the time

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

Power: 26% If the true impact was 1*SE… Hβ H0

Effect size = 1*SE: Power = 26%

slide-105
SLIDE 105

Bigger hypothesized effect size distributions farther apart

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment

H H

β

Effect size = 3*SE

3*SE

slide-106
SLIDE 106

Bigger Effect size means more power

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

Effect size = 3*SE: Power = 91%

H0 Hβ

slide-107
SLIDE 107

What effect size should you use when designing your experiment?

  • A. Smallest effect size

that is still cost effective

  • B. Largest effect size

you expect your program to produce

  • C. Both
  • D. Neither

A. B. C. D.

0% 0% 0% 0%

slide-108
SLIDE 108

Effect size

  • What effect size should we pick while calculating

the optimal sample size, assuming no other constraints?

  • Ideally, we design our experiment to detect the

smallest effect size that is still interesting.

– Interesting, as long as the value of that answer is worth the cost of the evaluation.

  • This is where “substantive significance” matters.
slide-109
SLIDE 109

Power: main ingredients

  • 1. Sample Size (N)
  • 2. Effect Size (δ)
  • 3. Variance (σ)
  • 4. Proportion of sample in T vs. C
  • 5. Clustering (ρ)
  • 6. Non-Compliance (akin to δ↓)
slide-110
SLIDE 110

Variance

  • There is sometimes very little we can do to reduce the noise
  • The underlying variance is what it is- just a characteristic of

the population at hand!

  • We can try to “absorb” variance:

– using a baseline – controlling for other variables

  • In practice, controlling for other variables (besides the baseline
  • utcome) buys you very little
slide-111
SLIDE 111

Power: main ingredients

  • 1. Sample Size (N)
  • 2. Effect Size (δ)
  • 3. Variance (σ)
  • 4. Proportion of sample in T vs. C
  • 5. Clustering (ρ)
  • 6. Non-Compliance (akin to δ↓)
slide-112
SLIDE 112

Clustered design: intuition

  • You want to know how close the upcoming state

elections will be

  • Method 1: Randomly select 50 people from entire

state (N=50)

  • Method 2: Randomly select 5 families in the state,

and ask ten members of each family their opinion (N=50)

slide-113
SLIDE 113

HIGH intra-cluster correlation (ICC) aka ρ (rho)

slide-114
SLIDE 114

LOW intra-cluster correlation (ICC) aka ρ (rho)

slide-115
SLIDE 115

All uneducated people live in one village. People with only primary education live in another. College grads live in a third, etc. ICC (ρ) on education will be..

  • A. High
  • B. Low
  • C. No effect on rho
  • D. Don’t know

A. B. C. D.

0% 0% 0% 0%

slide-116
SLIDE 116

If ICC (ρ) is high, what is a more efficient way of increasing power?

  • A. Include more

clusters in the sample

  • B. Include more

people in clusters

  • C. Both
  • D. Don’t know

A. B. C. D.

0% 0% 0% 0%

slide-117
SLIDE 117

BONUS SLIDES (TIME PERMITTING…)

slide-118
SLIDE 118

Testing multiple treatments

Control Group Balsakhi CAL program Balsakhi + CAL ←0.15 SD→ ↖ 0.25 SD ↘ ↑ 0.15 SD ↓ ↑ 0.10 SD ↓ ←0.10 SD→ ↗ 0.05 SD ↙ 50 50 50 100 100 100 200 200

slide-119
SLIDE 119

Power: main ingredients

  • 1. Sample Size (N)
  • 2. Effect Size (δ)
  • 3. Variance (σ)
  • 4. Proportion of sample in T vs. C
  • 5. Clustering (ρ)
  • 6. Non-Compliance (akin to δ↓)
slide-120
SLIDE 120

Power!

 

 

 

) 1 ( 1 * 1 1 *

2 1

    

m N P P t t EffectSize  

  Effect Size Variance Sample Size Significance Level Power Proportion in Treatment ICC Average Cluster Size

slide-121
SLIDE 121

Power!

 

 

 

) 1 ( 1 * 1 1 *

2 1

    

m N P P t t EffectSize  

  Proportion in Treatment

slide-122
SLIDE 122

Sample split: 50% C, 50% T

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

H0 Hβ

slide-123
SLIDE 123

Power: 91%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

slide-124
SLIDE 124

If it’s not 50-50 split?

  • What happens to the relative fatness if the

split is not 50-50?

  • Say 25-75?
slide-125
SLIDE 125

Sample split: 25% C, 75% T

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment significance

H0 Hβ

slide-126
SLIDE 126

Power: 83%

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6 control treatment power

slide-127
SLIDE 127

END!