Course Business l Sign up for final project presentations l Dec 5 th - - PowerPoint PPT Presentation

course business
SMART_READER_LITE
LIVE PREVIEW

Course Business l Sign up for final project presentations l Dec 5 th - - PowerPoint PPT Presentation

Course Business l Sign up for final project presentations l Dec 5 th : Doug, Jenny, Kelly, Kole, Rob, Zac l Dec 12 th : Ciara, Griff, Kori, Lauren, Lin, Rebecca l Shorter lecture on effect size & power l sleep.csv on CourseWeb l Scott to be out


slide-1
SLIDE 1

Course Business

l Sign up for final project presentations

l Dec 5th: Doug, Jenny, Kelly, Kole, Rob, Zac l Dec 12th: Ciara, Griff, Kori, Lauren, Lin, Rebecca

l Shorter lecture on effect size & power

l sleep.csv on CourseWeb

l Scott to be out of town the rest of the week

l E-mail responses may be slow l Slides posted next week

l NO CLASS NEXT WEEK

slide-2
SLIDE 2

Distributed Practice

  • Dr. Sánchez is interested in how much

achievement motivation varies over time in high

  • school. He plans to have high schoolers

respond to a brief measure of achievement motivation every school day for 2 school weeks (10 days total). However, he does not end up with 10 ratings from every student because:

  • (a) Some, less motivated students skip class
  • (b) Some classes had a fire drill on one day, so Dr.

Sánchez wasn’t able to administer the survey

  • Which (if any) of these sources of missingness

are ignorable? Which are non-ignorable?

slide-3
SLIDE 3

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-4
SLIDE 4
  • Longitudinal (5-week) study of stress
  • Dependent measure: Concentration of cortisol,

a stress hormone

  • Nanomoles per liter (nmol/L)
  • Personality, cognitive, environmental, clinical

variables

stress.csv

slide-5
SLIDE 5

Casewise Deletion

l In each comparison, delete only observations if

the missing data is relevant to this comparison

l Correlating Extraversion & Conscientiousness

à delete/ignore the red rows

slide-6
SLIDE 6

l In each comparison, delete only observations if

the missing data is relevant to this comparison

l Correlating Extraversion & ReadingSpan à

delete/ignore the blue row

Casewise Deletion

slide-7
SLIDE 7

Casewise Deletion

l Avoids data loss l But, results not completely consistent /

comparable because they’re based

  • n different observations

l e.g., possible to have A > B > C > A

l

cor.test(stress$ReadingSpan, stress$Extraversion)

l

cor.test(stress$Conscientiousness, stress$Extraversion)

df=453

d.f.s don’t match because they’re based on different subsets of the data

slide-8
SLIDE 8

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-9
SLIDE 9

Listwise Deletion

l Delete any observation where data is missing

anywhere

l e.g., stress2 <- na.omit(stress)

l Default in lmer() and many other software

packages

slide-10
SLIDE 10

Listwise Deletion

l Avoids inconsistency l In some cases, could result in a lot of data loss

l However, mixed effects models do well even with

moderate data loss (25%; Quene & van den Bergh, 2004)

l Unlike ANOVA, MEMs properly account for some

subjects or conditions having fewer observations

slide-11
SLIDE 11

Listwise Deletion

l Avoids inconsistency l In some cases, could result in a lot of data loss

l However, mixed effects models do well even with

moderate data loss (25%; Quene & van den Bergh, 2004)

l Unlike ANOVA, MEMs properly account for some

subjects or conditions having fewer observations

l Produces the correct parameter estimates if

missingness is ignorable

l Although some other things (R2) may be incorrect

l Estimates will be wrong if missingness is non-

ignorable

slide-12
SLIDE 12

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-13
SLIDE 13

Unconditional Imputation

l Replace missing values with the mean of the

  • bserved values

l Imputing the mean reduces the variance

l This increases chance of detecting spurious effects

l Also distorts the correlations with other variables l Bad. Don’t do this!

5, 8, 3, ?, ?

  • M = 5.33
  • S2 = 12.5

5, 8, 3, 5.33, 5.33

  • M = 5.33
  • S2 = 3.17
slide-14
SLIDE 14

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-15
SLIDE 15

Conditional Imputation

l Replace missing values with the values

predicted by a model using known variable(s)

l

interimModel <- lmer(ReadingSpan ~ 1 + OperationSpan + (1|Subject), data=stress)

l

Create a model that predicts RSpan from OSpan

l

predictedValues <- predict(interimModel, stress[is.na(stress$ReadingSpan),]

l

Use the model to predict the missing ReadingSpan values

l

stress[is.na(stress$ReadingSpan),'ReadingSpan'] <- predictedValues

l Replace the missing ReadingSpan values with the predicted

values

Reading Span

~ 1 +

Reading Span = NA NA Operation Span

slide-16
SLIDE 16

Conditional Imputation

l Replace missing values with the values

predicted by a model using known variable(s)

l If ignorable missingness, get the correct

parameter estimates

l And, standard errors not as distorted

l Especially if we add some noise to the fitted values

l predictedValues <- predictedValues +

rnorm(length(predictedValues), mean=0, sd=ResidualSDFromTheModel)

Reading Span

~ 1 +

Reading Span = NA NA Operation Span

slide-17
SLIDE 17

Conditional Imputation

l Replace missing values with the values

predicted by a model using known variable(s)

l Where this is useful?

l Many observations have a small

amount of missing data, but which column it is varies

l Listwise deletion would wipe out

every row with a NA anywhere

Reading Span

~ 1 +

Reading Span = NA NA Operation Span

slide-18
SLIDE 18

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-19
SLIDE 19

Multiple Imputation

l Like doing conditional imputation several times

l Replace missing data with one possible set of values l Run the model l Repeat it

l Final result averages these

Dataset with imputation 1 Dataset with imputation 2 Dataset with imputation 3 Model results 1 Model results 2 Model results 3 Final Results

}

Dataset with missing data {

slide-20
SLIDE 20

Multiple Imputation

l R package mice (have to install)

l Schematic example:

l imp <- mice(stress) l Creates several sets of imputed data l Need to set some parameters to indicate level 1 vs level 2

variables, categorical vs continuous variables

l miModel <- with(imp, lmer(…) )

l Fit the model to each set of imputed data

l result <- pool(miModel)

l Combine the model results

l summary(result) l Limitations:

l Limited to two nested levels (with current software) l Only gives you fixed effect estimates (not estimates of

random effect variance)

l Can be time-consuming

slide-21
SLIDE 21

Pattern Mixture Models

l Alternative: Pattern-mixture models

l Classify participants by the patterns of missing data l Then look at the effects / pattern of results within

each group

l e.g., Effects of reading span for people with personality data

reported & effects of reading span for people without personality data reported

slide-22
SLIDE 22

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-23
SLIDE 23

Effect Size

  • With sleep.csv, let’s run a model predicting

HoursSleep from fixed effects of HoursExercise

and MgCaffeine, and a random intercept of

Subject

  • Which fixed effects significantly influence the

number of hours of sleep that people get?

slide-24
SLIDE 24

Effect Size

  • With sleep.csv, let’s run a model predicting

HoursSleep from fixed effects of HoursExercise

and MgCaffeine, and a random intercept of

Subject

  • Which fixed effects significantly influence the

number of hours of sleep that people get?

  • SleepModel <- lmer(HoursSleep ~ 1 +

HoursExercise + MgCaffeine + (1|Subject), data=sleep)

They both do!

slide-25
SLIDE 25

Effect Size

  • Remember that t statistics and p-values

tell us about whether there’s an effect in the population

  • Is the effect statistically reliable?
  • A separate question is how big the effect

is

  • Effect size
slide-26
SLIDE 26

Bigfoot: Little evidence he exists, but he’d be large if he did exist Pygmy hippo: We know it exists and it’s small

LARGE EFFECT SIZE, LOW RELIABILITY

[-.20, 1.80]

SMALL EFFECT SIZE, HIGH RELIABILITY

[.15, .35]

slide-27
SLIDE 27
  • Is bacon really this

bad for you??

October 26, 2015

slide-28
SLIDE 28
  • Is bacon really this

bad for you??

  • True that we have

as much evidence that bacon causes cancer as smoking causes cancer!

  • Same level of

statistical reliability

slide-29
SLIDE 29
  • Is bacon really this

bad for you??

  • True that we have

as much evidence that bacon causes cancer as smoking causes cancer!

  • Same level of

statistical reliability

  • But, effect size is

much smaller for bacon

slide-30
SLIDE 30

Effect Size

  • Our model results tell us both

Parameter estimate tells us about effect size t statistic and p-value tell us about statistical reliability

slide-31
SLIDE 31

Effect Size: Parameter Estimate

  • Simplest measure: Parameter estimates
  • Effect of 1-unit change in predictor on outcome

variable

  • “Each hour of exercise the day before resulted in

another 0.72 hours of sleep”

  • “Each minute of exercise increases life expectancy

by about 7 minutes.” (Moore et al., 2012, PLOS ONE)

  • “People with a college diploma earn around

$24,000 more per year.” (Bureau of Labor Statistics, 2018)

  • Concrete! Good for “real-world” outcomes
slide-32
SLIDE 32

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-33
SLIDE 33

Effect Size: Standardization

  • Which is the bigger effect?
  • 1 hour of exercise = 0.72 hours of sleep
  • 1 mg of caffeine = -0.004 hours of sleep
  • Problem: These are measured in

different units

  • Hours of exercise vs. mg of caffeine
slide-34
SLIDE 34

Effect Size: Standardization

  • Which is the bigger effect?
  • 1 hour of exercise = 0.72 hours of sleep
  • 1 mg of caffeine = -0.004 hours of sleep
  • Problem: These are measured in

different units

  • Hours of exercise vs. mg of caffeine
  • Convert to z-scores: # of standard

deviations from the mean

  • This scale applies to anything!
  • Standardized scores
slide-35
SLIDE 35

Effect Size: Standardization

  • scale() puts things in terms of z-scores
  • New z-scored version of HoursExercise:
  • sleep$HoursExercise.z <-

scale(sleep$HoursExercise)[,1]

  • # of standard deviations above/below mean hours
  • f exercise)
slide-36
SLIDE 36

Effect Size: Standardization

  • scale() puts things in terms of z-scores
  • New z-scored version of HoursExercise:
  • sleep$HoursExercise.z <-

scale(sleep$HoursExercise)[,1]

  • # of standard deviations above/below mean hours
  • f exercise)
  • Then use these in a new model
  • Try z-scoring MgCaffeine, too
  • Then, run a model with the z-scored variables.

Which has the largest effect?

slide-37
SLIDE 37

Effect Size: Standardization

  • Old results:
  • New results:

No change in statistical reliability Effect size is now estimated differently

slide-38
SLIDE 38

Effect Size: Standardization

  • New results:
  • 1 SD increase in exercise = 0.75 hours of

sleep

  • 1 SD increase in caffeine = -0.26 hours of

sleep

  • Exercise effect is bigger
slide-39
SLIDE 39

Effect Size: Standardization

  • Standardized effects

make our effect sizes somewhat more reliant on

  • ur data
  • Effect of 1 std. dev of

cigarette smoking on life expectancy depends on what that std. dev is

  • Varies a lot from

country to country!

  • Might get different

standardized effects even if unstandardized is the same

slide-40
SLIDE 40

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-41
SLIDE 41

Effect Size: Interpretation

  • Generic heuristic for standardized effect sizes
  • “Small” ≈ .25
  • “Medium” ≈ .50
  • “Large” ≈ .80
  • But, take these with several

grains of salt

  • Cohen (1988) just made them up
  • Not in context of particular domain
slide-42
SLIDE 42
  • Consider in context of other effect sizes in

this domain:

  • vs:
  • For interventions: Consider cost,

difficulty of implementation, etc.

  • Aspirin’s effect in reducing

heart attacks: d ≈ .06, but cheap!

Our effect: .20 Other effect 1: .30 Other effect 2: .40 Our effect: .20 Other effect 1: .10 Other effect 2: .15

Effect Size: Interpretation

slide-43
SLIDE 43
  • For theoretically guided research, compare

to predictions of competing theories

  • The lag effect in memory:
  • Is this about intervening items or time?

Study RACCOON 5 sec. Study WITCH 5 sec. Study VIKING 5 sec. Study RACCOON 5 sec. 1 sec 1 sec 1 sec 1 day Study RACCOON 5 sec. Study WITCH 5 sec. Study VIKING 5 sec. Study RACCOON 5 sec. 1 sec 1 sec 1 sec 1 day POOR recall of RACCOON GOOD recall of RACCOON

Effect Size: Interpretation

slide-44
SLIDE 44

Effect Size: Interpretation

  • Is lag effect about intervening items or time?
  • Intervening items hypothesis predicts A > B
  • Time hypothesis predicts B > A
  • Goal here is to use direction of the effect to

adjudicate between competing hypotheses

  • Not whether the lag effect is “small” or “large”

Study RACCOON 5 sec. Study WITCH 5 sec. Study VIKING 5 sec. Study RACCOON 5 sec. 1 sec 1 sec 1 sec 1 day TEST A: Study RACCOON 5 sec. Study WITCH 5 sec. Study RACCOON 5 sec. 10 sec 10 sec 1 day TEST B:

slide-45
SLIDE 45

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-46
SLIDE 46

Overall Variance Explained

  • How well can we explain this DV?
  • Test: Do predicted values match up well with the

actual outcomes?

  • R2:

cor(fitted(SleepModel), sleep$HoursSleep)^2

  • But, this includes what’s

predicted on basis of subjects (and other random effects)

  • Compare to the R2 of a

model with just the random effects & no fixed effects

  • 4

6 8 10 2 4 6 8 10 12 PREDICTED hours of sleep ACTUAL hours of sleep

slide-47
SLIDE 47

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-48
SLIDE 48
  • Does “brain training” affect general cognition?
  • H0: There is no effect of brain training on cognition
  • HA: There is an effect of brain training on

cognition

Recap of Null Hypothesis Significance Testing

slide-49
SLIDE 49
  • Does “brain training” affect general cognition?
  • H0: There is no effect of brain training on cognition
  • γ1 = 0 in the population
  • HA: There is an effect of brain training on

cognition

  • γ1 ≠ 0 in the population

Recap of Null Hypothesis Significance Testing

slide-50
SLIDE 50
  • Does “brain training” affect general cognition?
  • H0: There is no effect of brain training on cognition
  • γ1 = 0 in the population
  • HA: There is an effect of brain training on

cognition

  • γ1 ≠ 0 in the population

Recap of Null Hypothesis Significance Testing

slide-51
SLIDE 51
  • Is a z score of 3.3 good evidence against H0?
  • In a world where brain training has no effect on

cognition (H0), the most probable z score would have been 0

Recap of Null Hypothesis Significance Testing

slide-52
SLIDE 52
  • Is a z score of 3.3 good evidence against H0?
  • In a world where brain training has no effect on

cognition (H0), the most probable z score would have been 0

z = 0

Recap of Null Hypothesis Significance Testing

slide-53
SLIDE 53
  • But even under H0, we wouldn’t always expect

to get exactly a z-score of 0 in our sample

  • Observed effect will sometimes be higher or lower

just by chance (but these values have lower probability) – sampling error

z = 0 z = 1 z = -1.5

Recap of Null Hypothesis Significance Testing

slide-54
SLIDE 54
  • 3
  • 2
  • 1

1 2 3

  • In a world where H0 is true, the distribution of z-

scores should look like this

  • The normal distribution of z-scores has mean 0 and
  • std. dev. 1—the standard normal
  • How plausible is it that the z-score for our

sample came from this distribution?

z = 0 z = 1 z = -1.5

Recap of Null Hypothesis Significance Testing

slide-55
SLIDE 55

Total probability

  • f a z-score

here under H0 = .05

  • p-value: Probability of obtaining a result this

extreme under the null hypothesis of no effect

  • We reject H0 when the observed t or z has

< .05 probability of arising under H0

  • But, still possible to get this z when H0 is true

Recap of Null Hypothesis Significance Testing

slide-56
SLIDE 56
  • p-value: Probability of obtaining a result this

extreme under the null hypothesis of no effect

  • We reject H0 when the observed t or z has

< .05 probability of arising under H0

  • But, still possible to get this z when H0 is true

Recap of Null Hypothesis Significance Testing

slide-57
SLIDE 57

Total probability

  • f a z-score

here under H0 = .05

  • p-value: Probability of obtaining a result this

extreme under the null hypothesis of no effect

  • We reject H0 when the observed t or z has

< .05 probability of arising under H0

  • But, still possible to get this z when H0 is true
  • In that case, we’d incorrectly conclude that brain

training works when it actually doesn’t

  • False positive or Type I error

Recap of Null Hypothesis Significance Testing

slide-58
SLIDE 58

Total probability

  • f a z-score

here under H0 = .05

  • What is our rate of Type I error?
  • Even in a world where H0 is true, 5% of z values

fall in white area

  • Thus, a 5% probability
  • α = rate of Type I error = .05

Recap of Null Hypothesis Significance Testing

slide-59
SLIDE 59
  • So, in a world where H0 is true, two outcomes

possible

H0 is true HA is true

ACTUAL STATE OF THE WORLD WHAT WE DID

Retain H0 Reject H0

OOPS! Type I error Probability: α GOOD! Probability: 1-α

Recap of Null Hypothesis Significance Testing

slide-60
SLIDE 60
  • What about a world where HA is true?

Recap of Null Hypothesis Significance Testing

slide-61
SLIDE 61
  • Another mistake we could make: There

really is an effect, but we retained H0

  • False negative / Type II error
  • Traditionally, not considered as “bad” as Type I
  • Probability: β

H0 is true HA is true

ACTUAL STATE OF THE WORLD WHAT WE DID

Retain H0 Reject H0

OOPS! Type I error Probability: α GOOD! Probability: 1-α OOPS! Type II error Probability: β

Recap of Null Hypothesis Significance Testing

slide-62
SLIDE 62

Recap of Null Hypothesis Significance Testing

slide-63
SLIDE 63
  • POWER (1-β): Probability of correct rejection
  • f H0: detecting the effect when it really exists
  • If our hypothesis (HA) is right, what probability is

there of obtaining significant evidence for it?

H0 is true HA is true

ACTUAL STATE OF THE WORLD WHAT WE DID

Retain H0 Reject H0

OOPS! Type I error Probability: α GOOD! Probability: 1-α OOPS! Type II error GOOD! Probability: 1-β Probability: β

Recap of Null Hypothesis Significance Testing

slide-64
SLIDE 64
  • POWER (1-β): Probability of correct rejection
  • f H0: detecting the effect when it really exists
  • Can we find the thing we’re looking for?

Recap of Null Hypothesis Significance Testing

slide-65
SLIDE 65
  • POWER (1-β): Probability of correct rejection
  • f H0: detecting the effect when it really exists
  • Can we find the thing we’re looking for?
  • If our hypothesis is true, what is the probability

we’ll get p < .05 ?

  • We compare retrieval practice to re-reading

with power = .75

  • If retrieval practice is actually beneficial, there is a

75% chance we’ll get a significant result

  • We compare bilinguals to monolinguals on a

test of non-verbal cognition with power = .35

  • If there is a difference between monolinguals &

bilinguals, there is a 35% chance we’ll get p < .05

Recap of Null Hypothesis Significance Testing

slide-66
SLIDE 66

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-67
SLIDE 67

Why Do We Care About Power?

  • 1. Efficient use of resources
  • A major determinant of power is sample size

(larger = more power)

  • Power analyses tell us if our planned sample size

(n) is:

  • Large enough to be able to find what we’re

looking for

  • Not too large that we’re collecting more data

than necessary

slide-68
SLIDE 68

Why Do We Care About Power?

  • 1. Efficient use of resources
  • A major determinant of power is sample size

(larger = more power)

  • Power analyses tell us if our planned sample size

(n) is:

  • Large enough to be able to find what we’re

looking for

  • Not too large that we’re collecting more data

than necessary

  • This is about good use of our resources
  • Societal resources: Money,

participant hours

  • Your resources: Time!!
slide-69
SLIDE 69

Why Do We Care About Power?

  • 1. Efficient use of resources
  • A major determinant of power is sample size

(larger = more power)

  • Power analyses tell us if our planned sample size

(n) is:

  • Large enough to be able to find what we’re

looking for

  • Not too large that we’re collecting more data

than necessary

  • This is about good use of our resources
slide-70
SLIDE 70

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • Rate of false positive results increases if we keep

collecting data whenever our effect is non-sig.

  • In the limit, ensures

a significant result

  • Random sampling

means that p-value is likely to differ in each sample

p-value happens to be higher in this slightly larger sample

slide-71
SLIDE 71

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • Rate of false positive results increases if we keep

collecting data whenever our effect is non-sig.

  • In the limit, ensures

a significant result

  • Random sampling

means that p-value is likely to differ in each sample

Now, p-value happens to be lower

slide-72
SLIDE 72

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • Rate of false positive results increases if we keep

collecting data whenever our effect is non-sig.

  • In the limit, ensures

a significant result

  • Random sampling

means that p-value is likely to differ in each sample

  • At some point,

p < .05 by chance

SIGNIFICANT!! PUBLISH NOW!!

slide-73
SLIDE 73

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • Rate of false positive results increases if we keep

collecting data whenever our effect is non-sig.

  • In the limit, ensures

a significant result

  • Random sampling

means that p-value is likely to differ in each sample

  • At some point,

p < .05 by chance

  • Bias to get positive

results if we stop if and only if p < .05

But not significant in this even larger sample

slide-74
SLIDE 74

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • Rate of false positive results increases if we keep

collecting data whenever our effect is non-sig.

Collect data Significant result? NO

YES

It’s “statistically significant,” so that means it’s real. Publish it! Maybe we just didn’t have enough data yet.

slide-75
SLIDE 75

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • Rate of false positive results increases if we keep

collecting data whenever our effect is non-sig.

  • We can avoid this if we use a power analysis to

decide our sample size in advance

slide-76
SLIDE 76

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • 3. Understand non-replication (Open Science

Collaboration, 2015)

  • Even if an effect exists in the population, we’d

expect some non-significant results

  • Power is almost never 100%
  • In fact, many common designs in psychology have

low power (Etz & Vandekerckhove, 2016; Maxwell et al., 2015)

  • Small to moderate sample sizes
  • Small effect sizes
slide-77
SLIDE 77

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • 3. Understand non-replication (Open Science

Collaboration, 2015)

slide-78
SLIDE 78

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • 3. Understand non-replication (Open Science

Collaboration, 2015)

  • Even if an effect exists in the population, we’d

expect some non-significant results

  • Power is almost never 100%
  • In fact, many common designs in psychology have

low power (Etz & Vandekerckhove, 2016; Maxwell et al., 2015)

  • Small effect sizes
  • Small to moderate sample sizes
  • Failures to replicate might be a sign of low power,

rather than a non-existent effect

slide-79
SLIDE 79

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • 3. Understand non-replication (Open Science

Collaboration, 2015)

  • 4. Understand null results
  • A non-significant result, by itself, doesn’t prove an

effect doesn’t exist

  • We “fail to reject H0” rather than “accept H0”
slide-80
SLIDE 80

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • 3. Understand non-replication (Open Science

Collaboration, 2015)

  • 4. Understand null results
  • A non-significant result, by itself, doesn’t prove an

effect doesn’t exist

  • We “fail to reject H0” rather than “accept H0”
  • “Absence of evidence is not evidence of absence.”

“I looked around Schenley Park for 15 minutes and didn’t see any giraffes. Therefore, giraffes don’t exist.”

slide-81
SLIDE 81

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • 3. Understand non-replication (Open Science

Collaboration, 2015)

  • 4. Understand null results
  • A non-significant result, by itself, doesn’t prove an

effect doesn’t exist

  • We “fail to reject H0” rather than “accept H0”
  • “Absence of evidence is not evidence of absence.”

We didn’t find enough evidence to conclude there is a significant effect No significant effect exists DOES NOT MEAN

slide-82
SLIDE 82

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • 3. Understand non-replication (Open Science

Collaboration, 2015)

  • 4. Understand null results
  • A non-significant result, by itself, doesn’t prove an

effect doesn’t exist

  • We “fail to reject H0” rather than “accept H0”
  • “Absence of evidence is not evidence of absence.”
  • Major criticism of null hypothesis significance testing!
slide-83
SLIDE 83

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • 3. Understand non-replication (Open Science

Collaboration, 2015)

  • 4. Understand null results
  • A non-significant result, by itself, doesn’t prove an

effect doesn’t exist

  • But, with high power, null result is more informative
slide-84
SLIDE 84

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • 3. Understand non-replication (Open Science

Collaboration, 2015)

  • 4. Understand null results
  • A non-significant result, by itself, doesn’t prove an

effect doesn’t exist

  • But, with high power, null result is more informative
  • e.g., null effect of working memory training on

intelligence with 20% power

  • Maybe brain training works & we just couldn’t detect

the effect

  • But: null effect of WM on intelligence with 90% power
  • Unlikely that we just missed the effect!
slide-85
SLIDE 85

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • 3. Understand non-replication (Open Science

Collaboration, 2015)

  • 4. Understand null results
  • A non-significant result, by itself, doesn’t prove an

effect doesn’t exist

  • But, with high power, null result is more informative
slide-86
SLIDE 86

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • 3. Understand non-replication (Open Science

Collaboration, 2015)

  • 4. Understand null results
  • 5. Granting agencies now want to see it
  • Don’t want to fund a study with low probability of

showing anything

  • e.g., Our theory predicts greater activity in Broca’s area in

condition A than condition B. But our experiment has only a 16% probability of detecting the difference. Not good!

slide-87
SLIDE 87

Why Do We Care About Power?

  • 1. Efficient use of resources
  • 2. Avoid p-hacking (Simmons et al., 2011)
  • 3. Understand non-replication (Open Science

Collaboration, 2015)

  • 4. Understand null results
  • 5. Granting agencies now want to see it
  • NIH:
  • IES:
slide-88
SLIDE 88

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-89
SLIDE 89

Estimating Effect Size

  • One reason we haven’t always calculated

power is it requires the effect size

  • But, several ways to estimate effect size:
  • 1. Prior literature
  • What is the effect size in other studies in this

domain or with a similar manipulation?

slide-90
SLIDE 90

Estimating Effect Size

  • One reason we haven’t always calculated

power is it requires the effect size

  • But, several ways to estimate effect size:
  • 1. Prior literature
  • 2. Pilot study
  • Run a version of the study with a smaller n
  • Don’t worry about whether effect is

significant, just use data to estimate effect size

slide-91
SLIDE 91

Estimating Effect Size

  • One reason we haven’t always calculated

power is it requires the effect size

  • But, several ways to estimate effect size:
  • 1. Prior literature
  • 2. Pilot study
  • 3. Smallest Effect Size Of Interest (SESOI)
  • Decide smallest effect size we’d care about
  • e.g., we want our educational intervention to

have an effect size of at least .05 GPA

  • Calculate power based on that effect size
  • True that if actual effect is smaller than

.05 GPA, our power would be lower, but the idea is we no longer care about the intervention if its effect is that small

slide-92
SLIDE 92

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-93
SLIDE 93

Your Own Power Analysis

  • Rationale behind power analyses:
  • Can we detect the kind & size of

effect we’re interested in?

  • What sample size would we need?
  • In practice:
  • We can’t control effect size; it’s a property of nature
  • α is usually fixed (e.g., at .05) by convention
  • But, we can control our sample size n!
  • So:
  • Determine desired power (often .80)
  • Estimate the effect size(s)
  • Calculate the necessary sample size n
slide-94
SLIDE 94

Determining Power

  • Power for ANOVAs can be easily found from

tables

  • Simpler design. Only 1 random effect (at most)
  • More complicated for mixed effect models
slide-95
SLIDE 95

Monte Carlo Methods

  • Remember the definition of power?
  • The probability of observing a significant effect in
  • ur sample if the effect truly exists in the

population

  • What if we knew for a fact that the effect existed in

a particular population?

  • Then, a measure of power is how often we get a

significant result in a sample (of our intended n)

  • Observe a significant effect 10 samples out of

20 = 50% of the time = power of .50

  • Observe a significant effect 300 samples out of

1000 = 30% of the time = power of .30

  • Observe a significant effect 800 samples out of

1000 = 80% of the time = power of .80

slide-96
SLIDE 96

Monte Carlo Methods

  • Remember the definition of power?
  • The probability of observing a significant effect in
  • ur sample if the effect truly exists in the

population

  • What if we knew for a fact that the effect existed in

a particular population?

  • Then, a measure of power is how often we get a

significant result in a sample (of our intended n) Great, but where am I ever going to find data where I know exactly what the population parameters are?

slide-97
SLIDE 97

Monte Carlo Methods

  • Remember the definition of power?
  • The probability of observing a significant effect in
  • ur sample if the effect truly exists in the

population

  • What if we knew for a fact that the effect existed in

a particular population?

  • Then, a measure of power is how often we get a

significant result in a sample (of our intended n)

  • Solution: We create (“simulate”) the data.
slide-98
SLIDE 98

Data Simulation

  • Set some plausible population parameters

(effect size, subject variance, item var., etc.)

  • Since we are creating the data…
  • We can choose the population parameters
  • We know we exactly what they are

Set population parameters Mean = 723 ms Group difference = 100 ms Subject var = 30

slide-99
SLIDE 99

Data Simulation

  • Create (“simulate”) a random sample drawn

from this population

  • Like most samples, the sample statistics will not

exactly match the population parameters

  • It’s randomly generated
  • But, the difference is we know what the

population is like & that there IS an effect

Set population parameters Mean = 723 ms Group difference = 100 ms Subject var = 30 Create a random sample from these data N subjects = 20 N items = 40

slide-100
SLIDE 100

Data Simulation

  • Now, fit our planned mixed-effects model to this

sample of simulated data to get one result

  • Might get a significant result
  • Correctly detected the effect in the population
  • Might get a non-significant result
  • Type II error – missed an effect that really exists in

the population

Set population parameters Mean = 723 ms Group difference = 100 ms Subject var = 30 Create a random sample from these data N subjects = 20 N items = 40 Run our planned model and see if we get a significant result

slide-101
SLIDE 101

Monte Carlo Methods

  • If we do this repeatedly, we will get multiple

significance tests, each on a different sample

  • Outcomes:
  • Sample 1: p < .05 (Yes)
  • Sample 2: p = .23 (No)
  • Sample 3: p < .05 (Yes)
  • Sample 4: p = .14 (No)
  • Detected the effect ½ of the time: Power = .50

Set population parameters Mean = 723 ms Group difference = 100 ms Subject var = 30 Create a random sample from these data N subjects = 20 N items = 40 Run our planned model and see if we get a significant result Repeat with a new sample from the same population

slide-102
SLIDE 102

Monte Carlo Methods

  • If we do this repeatedly, we will get multiple

significance tests, each on a different sample

  • Hmm, that power

wasn’t very good L

Set population parameters Mean = 723 ms Group difference = 100 ms Subject var = 30 Create a random sample from these data N subjects = 20 N items = 40 Run our planned model and see if we get a significant result Repeat with a new sample from the same population

slide-103
SLIDE 103

Monte Carlo Methods

  • If we do this repeatedly, we will get multiple

significance tests, each on a different sample

  • Hmm, that power

wasn’t very good L

  • Let’s increase the number of subjects and run a

new simulation to see what our power is like now

Set population parameters Mean = 723 ms Group difference = 100 ms Subject var = 30 Create a random sample from these data N subjects = 60 N items = 40 Run our planned model and see if we get a significant result Repeat with a new sample from the same population

slide-104
SLIDE 104

Monte Carlo Methods

  • If we do this repeatedly, we will get multiple

significance tests, each on a different sample

  • Goal: Find the sample

size(s) that let you detect the effect at least 80%

  • f the time (or whatever your desired power is)
  • Will 40 subjects in each of 5 schools suffice?
  • What about 50 subjects in each of 10 schools?

Set population parameters Mean = 723 ms Group difference = 100 ms Subject var = 30 Create a random sample from these data N subjects = 60 N items = 40 Run our planned model and see if we get a significant result Repeat with a new sample from the same population

slide-105
SLIDE 105

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power

slide-106
SLIDE 106

Influences on Power

  • So what makes for a powerful design?
  • Things that increase power:
  • Larger effect size estimates for the fixed effects
  • Bigger things are easier to find
  • Larger sample size (at any level)
  • More data = more confidence
  • This one we can control
  • Increasing sample size at a higher level (e.g., subjects

rather than time points within subjects) is more effective

  • Variance of independent variables
  • Easier to see an effect of income on happiness if people

vary in their income

  • Hard to test effect of “number of fingers on your hand”
  • With a categorical variable, would prefer to have an equal

# of observations in each condition—most information

slide-107
SLIDE 107

Influences on Power

  • So what makes for a powerful design?
  • Things that decrease power:
  • Larger variance of random effects
  • More differences between people (noise) make it harder

to see what’s consistent

  • Larger error variance
  • Again, more noise = harder to see consistent effects
  • May be able to reduce if you can add covariates / control

variables

slide-108
SLIDE 108

Week 12: Effect Size & Power

l Missing Data Solutions

l Casewise Deletion l Listwise Deletion l Unconditional Imputation l Conditional Imputation l Multiple Imputation

l Effect Size

l Unstandardized l Standardized l Interpreting Effect Size l Variance Explained

l Power

l Recap of Null Hypothesis Significance Testing l Why Should We Care? l Estimating Effect Size l Doing Your Own Power Analysis l Influences on Power