Experimental design (continued) Spring 2017 Michelle Mazurek Some - - PowerPoint PPT Presentation

experimental design continued
SMART_READER_LITE
LIVE PREVIEW

Experimental design (continued) Spring 2017 Michelle Mazurek Some - - PowerPoint PPT Presentation

Experimental design (continued) Spring 2017 Michelle Mazurek Some content adapted from Bilge Mutlu, Vibha Sazawal, Howard Seltman 1 Administrative No class Tuesday Homework 1 Plug for Tanu Mitra grad student session 2 Todays


slide-1
SLIDE 1

1

Experimental design (continued)

Spring 2017 Michelle Mazurek

Some content adapted from Bilge Mutlu, Vibha Sazawal, Howard Seltman

slide-2
SLIDE 2

2

Administrative

  • No class Tuesday
  • Homework 1
  • Plug for Tanu Mitra grad student session
slide-3
SLIDE 3

3

Today’s class

  • Finish threats to validity
  • Experimental design / choices
  • Alternatives to experiments
slide-4
SLIDE 4

4

Quick review

  • Internal validity: causality

– Isolate variable of interest – Randomized assignment

  • External validity

– Representative sample – Representative environment/task/analysis

  • Valid constructs

– Measure something meaningful – Reliable

slide-5
SLIDE 5

5

Know what you’re measuring

  • Especially when dealing with large-scale data

from the internet

– What are you missing? What is duplicated? – What is the precision and accuracy of the data? – Are you capturing what you think you’re capturing? – *Vantage point* – Representativeness / diversity

slide-6
SLIDE 6

6

Calibrating constructs

  • Examine outliers and spikes
  • Check for self-consistency
  • Compare multiple measures

– Multiple datasets – Multiple ways of calculating a value

  • Test with synthetic data
  • Check longitudinal data periodically!
slide-7
SLIDE 7

7

Mis-measurements, now what?

  • Discard? (Why might this be bad?)

– Discard outliers? Definition?

  • Use an explicit adjustment?
slide-8
SLIDE 8

8

Other measurement notes

  • (Don’t really fit here, but from Paxson paper)
  • Metadata and good analysis logging is critical!
  • Be clear about unknowns and limitations
slide-9
SLIDE 9

9

  • 4. Power
  • Power: Likelihood that if there’s a real effect, you

will find it.

  • Why might you not find it?

– Sample size – Effect size – Missing explanatory variables – Va Variability

slide-10
SLIDE 10

10

Promote power

  • Potential tradeoff: Generalizability for power

– E.g., limit variability between subjects

  • Covariates: Measure possible

confounds, include in analysis

  • Use reliable measurements
  • Control the environment

http://4.bp.blogspot.com/-Fuha1- JXxto/T_ssNrNODtI/AAAAAAAAAo0/LXcl0Pxzg40/s1

slide-11
SLIDE 11

11

EX EXPERI ERIMEN ENTAL DES ESIGN

slide-12
SLIDE 12

12

Some important decisions

  • What is the hypothesis?
  • Between or within subjects?
  • What treatment levels / conditions?
  • What dependent variables to measure?
slide-13
SLIDE 13

13

Good hypothesis design

  • Predicted relationship between (at least) 2 vars

– Testable, falsifiable

  • Operational

– Vars are clearly defined – Relationship / how you measure it clearly defined

slide-14
SLIDE 14

14

Good hypothesis design (cont.)

  • Justified

– Exploratory results – Theory in related area – Well justified intuition?

  • Parsimonious
slide-15
SLIDE 15

15

Between vs. Within

  • Between: Each participant belongs to exactly
  • ne condition
  • Within: Each participant belongs to multiple
slide-16
SLIDE 16

16

Between vs. Within

  • More participants
  • Cleaner/less bias
  • More time each
  • More power (less

variability subj-subj)

slide-17
SLIDE 17

17

Improving on between-subjects

  • Matching: Get like participants for each

condition

  • Pro: reduces variability
  • Con: Hard to find; what do you match on?
  • In general, be very cautious
slide-18
SLIDE 18

18

Improving on within-subjects

  • Ordering effects can be HUGE

– Learning, fatigue – Range effects: learn most for closest conditions

  • Mitigate via co

counte terba rbalanci cing

– All possible orders – Balanced latin square

A B C D C A D B B D A C D C B A

slide-19
SLIDE 19

19

Counterbalancing doesn’t fix:

  • Range effects (most average treatment)
  • Context effects (what most participants are more

familiar with)

slide-20
SLIDE 20

20

Mixed models are also possible

  • Everyone gets the same three tasks
  • Order of tasks varies
  • Tool with which to execute tasks varies
slide-21
SLIDE 21

21

Selecting conditions

  • How many IVs?

– Password meter example

  • How many / which levels for each?

– Cannot infer anything about levels you didn’t test

slide-22
SLIDE 22

22

Full-factorial (or not)

  • Full-factorial: All possible combinations of all Ivs

– And all orderings?

  • Not: Only a subset

– Selected how? – Recall: Vary at most one thing each time!

  • Planned comparisons!
slide-23
SLIDE 23

23

Why multivariate?

  • What is different between running one

experiment with two IVs vs. two experiments with one IV each?

  • Interaction effects!
slide-24
SLIDE 24

24

Dependent variables

  • What and how to measure?

– Construct validity, again! – Performance (time, errors, FP/FN, etc.) – Opinions/attitude – Audio recording, screen capture, keystrokes, copy- pasting behavior, etc. – Demographics

  • Multiple measures toward higher-level construct?
slide-25
SLIDE 25

25

NO NOT J T JUST E ST EXPE PERIM IMENTS NTS

slide-26
SLIDE 26

26

Kinds of measurement studies

  • Experimental
  • Observational/correlational
  • Quasi-experimental
slide-27
SLIDE 27

27

Observational/correlational

  • Observe that X and Y (don’t) increase and

decrease together / in opposition

  • Research doesn’t apply any control or treatment:

just measure incidence

– Does lead exposure correlate with crime rate?

  • Directionality and third-variable both issues
slide-28
SLIDE 28

28

Quasi-experiments

  • Subset of observational studies
  • Can’t randomize assignment
  • But, experimenter controls something

Group 1 Group 2 Treatment Group 1 Group 2

slide-29
SLIDE 29

29

Observational examples

  • Cohort study
  • Regression discontinuity
  • BIBIFI example
slide-30
SLIDE 30

30

Pluses and minuses

  • Can measure things that simply can’t be done

with true experiments

  • In general, association at best – causality very

hard to establish

– Some statistical techniques to help exist

  • Low internal validity – can you maximize it within

the available constraints?