experimental design continued
play

Experimental design (continued) Spring 2017 Michelle Mazurek Some - PowerPoint PPT Presentation

Experimental design (continued) Spring 2017 Michelle Mazurek Some content adapted from Bilge Mutlu, Vibha Sazawal, Howard Seltman 1 Administrative No class Tuesday Homework 1 Plug for Tanu Mitra grad student session 2 Todays


  1. Experimental design (continued) Spring 2017 Michelle Mazurek Some content adapted from Bilge Mutlu, Vibha Sazawal, Howard Seltman 1

  2. Administrative • No class Tuesday • Homework 1 • Plug for Tanu Mitra grad student session 2

  3. Today’s class • Finish threats to validity • Experimental design / choices • Alternatives to experiments 3

  4. Quick review • Internal validity: causality – Isolate variable of interest – Randomized assignment • External validity – Representative sample – Representative environment/task/analysis • Valid constructs – Measure something meaningful – Reliable 4

  5. Know what you’re measuring • Especially when dealing with large-scale data from the internet – What are you missing? What is duplicated? – What is the precision and accuracy of the data? – Are you capturing what you think you’re capturing? – *Vantage point* – Representativeness / diversity 5

  6. Calibrating constructs • Examine outliers and spikes • Check for self-consistency • Compare multiple measures – Multiple datasets – Multiple ways of calculating a value • Test with synthetic data • Check longitudinal data periodically! 6

  7. Mis-measurements, now what? • Discard? (Why might this be bad?) – Discard outliers? Definition? • Use an explicit adjustment? 7

  8. Other measurement notes • (Don’t really fit here, but from Paxson paper) • Metadata and good analysis logging is critical! • Be clear about unknowns and limitations 8

  9. 4. Power • Power: Likelihood that if there’s a real effect, you will find it. • Why might you not find it? – Sample size – Effect size – Missing explanatory variables – Va Variability 9

  10. JXxto/T_ssNrNODtI/AAAAAAAAAo0/LXcl0Pxzg40/s1 Promote power • Covariates: Measure possible http://4.bp.blogspot.com/-Fuha1- confounds, include in analysis • Use reliable measurements • Control the environment • Potential tradeoff: Generalizability for power – E.g., limit variability between subjects 10

  11. EX EXPERI ERIMEN ENTAL DES ESIGN 11

  12. Some important decisions • What is the hypothesis? • Between or within subjects? • What treatment levels / conditions? • What dependent variables to measure? 12

  13. Good hypothesis design • Predicted relationship between (at least) 2 vars – Testable, falsifiable • Operational – Vars are clearly defined – Relationship / how you measure it clearly defined 13

  14. Good hypothesis design (cont.) • Justified – Exploratory results – Theory in related area – Well justified intuition? • Parsimonious 14

  15. Between vs. Within • Between: Each participant belongs to exactly one condition • Within: Each participant belongs to multiple 15

  16. Between vs. Within • More participants • More time each • Cleaner/less bias • More power (less variability subj-subj) 16

  17. Improving on between-subjects • Matching: Get like participants for each condition • Pro: reduces variability • Con: Hard to find; what do you match on? • In general, be very cautious 17

  18. Improving on within-subjects • Ordering effects can be HUGE – Learning, fatigue – Range effects: learn most for closest conditions • Mitigate via co counte terba rbalanci cing – All possible orders A B C D – Balanced latin square C A D B B D A C D C B A 18

  19. Counterbalancing doesn’t fix: • Range effects (most average treatment) • Context effects (what most participants are more familiar with) 19

  20. Mixed models are also possible • Everyone gets the same three tasks • Order of tasks varies • Tool with which to execute tasks varies 20

  21. Selecting conditions • How many IVs? – Password meter example • How many / which levels for each? – Cannot infer anything about levels you didn’t test 21

  22. Full-factorial (or not) • Full-factorial: All possible combinations of all Ivs – And all orderings? • Not: Only a subset – Selected how? – Recall: Vary at most one thing each time! • Planned comparisons! 22

  23. Why multivariate? • What is different between running one experiment with two IVs vs. two experiments with one IV each? • Interaction effects! 23

  24. Dependent variables • What and how to measure? – Construct validity, again! – Performance (time, errors, FP/FN, etc.) – Opinions/attitude – Audio recording, screen capture, keystrokes, copy- pasting behavior, etc. – Demographics • Multiple measures toward higher-level construct? 24

  25. NO NOT J T JUST E ST EXPE PERIM IMENTS NTS 25

  26. Kinds of measurement studies • Experimental • Observational/correlational • Quasi-experimental 26

  27. Observational/correlational • Observe that X and Y (don’t) increase and decrease together / in opposition • Research doesn’t apply any control or treatment: just measure incidence – Does lead exposure correlate with crime rate? • Directionality and third-variable both issues 27

  28. Quasi-experiments • Subset of observational studies • Can’t randomize assignment • But, experimenter controls something Group 1 Group 1 Treatment Group 2 Group 2 28

  29. Observational examples • Cohort study • Regression discontinuity • BIBIFI example 29

  30. Pluses and minuses • Can measure things that simply can’t be done with true experiments • In general, association at best – causality very hard to establish – Some statistical techniques to help exist • Low internal validity – can you maximize it within the available constraints? 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend