hypothesis testing part 2
play

Hypothesis testing, part 2 With some material from Howard Seltman, - PowerPoint PPT Presentation

Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGOR ORICAL IV, NU NUMERI ERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2


  1. Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1

  2. CATEGOR ORICAL IV, NU NUMERI ERIC DV 2

  3. Independent samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 T-test Mann-Whitney U, bootstrap 2+ One-way ANOVA Kruskal-Wallis, bootstrap 3

  4. Is your data normal? • Skewness: asymmetry • Kurtosis: “peakedness” rel. to normal – Both: within +- 2SE(s/u) is OK • Or use Shapiro-Wilk (null = normal) • Or look at Q-Q plot 4

  5. T-test • Already talked about • Assumptions: normality, equal variances, independent samples – Can use Levene to test equal variance assumption • Post-test: check residuals for assumption fit – For a t-test this is the same pre or post – For other tests you check residual vs. fit post 5

  6. One way ANOVA • H0: m 1 = m 2 = m 3 • H1: at least one doesn’t match • NOT H1: m 1 != m 2 != m 3 • Assumptions: normality, common variance, independent errors • Intuition: F statistic – Variance between / Variance within – Under (exact null), F=1; F >> 1 rejects null 6

  7. One-way ANOVA • F = MS b / MS w • MSw = sum [sum[ (diff from mean) 2 ]] / df w – df w = N-k, where k = number of conditions – Sum over all conditions; sum per condition • MS b = sum [(diff from grand mean) 2] / df b – df b = k-1 – Every observation goes in the sum 7

  8. (example from Vibha Sazawal) 8

  9. 9

  10. F-distribution rejected 10

  11. Now what? (Contrasts) • So we rejected the null. What did we learn? – What *didn’t* we learn? – At least one is different ... Which? All? – This is called an “omnibus test” • To answer our actual research question, we usually need pa pairw rwise co contra trasts ts 11

  12. The trouble with contrasts • Contrasts mess with your Type I bounds – One test: 95% confident – Three tests: 85.7% confident – 5 conditions, all pairs: 4 + 3 + 2 + 1 = 10 tests: 59.9% – UH OH 12

  13. Planned vs. post hoc • Planned: You have a theory. – Really, no cheating – You get n-1 pairwise comparisons for free – In theory, should not be control vs. all, but prob. OK – NO COMPARISONS unless omnibus passes • Post-hoc – Anything unplanned – More than n-1 – Requires correction! – Doesn’t necessarily require omnibus first 13

  14. Correction • Adjust {p-values, alpha} to compensate for multiple testing post-hoc • Bonferroni (most conservative) – Assume all possible pairs: m = k(k-1)/m (comb.) – alpha c = alpha / m – Once you have looked, implication is you did all the comparisons implicitly! • Holm-Bonferroni is less conservative – Stepwise adjusting alpha as you go • Dunnett for specifically all vs. control, others 14

  15. Independent samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 T-test Mann-Whitney U, bootstrap 2+ One-way ANOVA Kruskal-Wallis, bootstrap 15

  16. Non-parametrics: MWU and K-W • Good for non-normal data, likert data (ordinal, not actually numeric) • Assumptions: independent, at least ordinal • Null: P(X > Y) = P(Y > X) where X,Y are observations from the 2 distributions (MWU) – If assume same distribution shape, continuous then this can can be seen as comparing medians 16

  17. MWU and K-W continued • Essentially: rank order all data (both conditions) – Total ranks for condition 1, compare to “expected” – Various procecures to correct for ties 17

  18. Bootstrap • Resampling technique(s) • Intuition: – Create “null” distribution by e.g. subtracting means so mA = mB = 0 • Now you have shifted samples A-hat and B-hat – Combine these to make a null distribution – Draw sample of size N, with replacement • Do it 1000 (or 10k) times – Use this to determine critical value (alpha = 0.05) – Compare this critical value to your real data for test 18

  19. Paired samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 Paired T-test Wilcoxon signed-rank 2+ 2-way ANOVA w/ Friedman subject random factor Mixed models (later) 19

  20. Paired T-test • Two samples per participant item • Test subtracts them • Then uses one-sample T-test with H0: m = 0 and H1: m != 0 • Regular T-test assumptions, plus: does subtraction make sense here? 20

  21. Wilcoxon S.R. / Friedman • H0: difference btwn pairs is symmetric around 0 • H1: … or not • Excludes no-change items • Essentially: rank by abs. difference; compare signs * ranks • (Friedman = 3+ generalization) 21

  22. One numeric IV, numeric DV SIMPLE LINEAR REGRESSION ON 22

  23. Simple linear regression • E(Y|x) = b 0 + b 1 x … looks at populations – Population mean at this value of x • Key H0: b 1 != 0 – b 0 usually not important for significance (obv. important in model fit) • b 1 : slope à change in Y per unit X • Best fit: Least squares, or maximum likelihood – LSq: minimize sum of squares of residuals – ML: max prob. of seeing this data with this model 23

  24. Assumptions, caveats • Assumes: – linearity in Y ~ X – normally distributed error for each x, with constant variance at all x – Error measuring X is small compared to var. Y (fixed X) • Independent errors! – Serial correlation, data that is grouped, etc. (later) • Don’t interpret widely outside available x vals • Can transform for linearity! – Log(Y), sqrt(y), 1/y, y^2 24

  25. Assumption/residual checking • Before: Use scatterplot for plausible linearity • After: residual vs. fit – Residual on Y vs. predicted on X – Should be relatively even distributed around 0 (linear) – Should have relatively even v. spread (eq. var) • After: quantile-normal of residuals 25

  26. Model interpretation • Interpret b1, interpret the p-value • CI: if it crosses 0, it’s not significant • R 2 : fraction of total variation accounted for – Intutively: explained variance / total variance – Explained = var(Y) – residual errors • F 2 = R 2 / (1 – R R 2 ); SML: 0.02, 0.15, 0.35 (cohen) 26

  27. Robustness • Brittle to linearity, independent errors • Somewhat brittle to fixed-X • Fairly robust to equal variance • Quite robust to normality 27

  28. CATEGOR ORICAL OU OUTCOM OMES 28

  29. One Cat. IV, Cat. DV, independent • Contingency tables: how many people in each combination of categories 29

  30. Chi-square test of independence • H0: distribution of Var1 is the same at every level of Var2 (and vice versa) – Null dist. Approaches X^2 when sample size grows – Heuristic: no cells < 5 – Can use FET instead • Intuition: – Sum over rows/columns: (observed – expected)^2 / expected – Expected: marginal % * count in other margin 30

  31. Paired 2x2 tables • Use McNemar’s test – Contigency table: matches and mismatches for each option. • H0: marginals are the same Cond1: Yes Cond 1: No Cond2: Yes a b a + b Cond2: No c d c + d a + c b + d N • Essentially a X^2 test on the agreement – Test stat: (b-c)^2 / (b+c) 31

  32. Paired, continued • Cochran’s Q: extended for more than two conditions • Other similar extensions for related tasks 32

  33. Critiques • Choose a paper that has one (or more) empirical experiments as a central contribution – Doesn’t have to be human subjects, but can be – Does have to have enough description of experiment • 10-12 minute presentation • Briefly: research questions, necessary background • Main: describe and critique methods – Experimental design, data collection, analysis – Good, bad, ugly, missing • Briefly, results? 33

  34. Logistic regression (logit) • Numeric IV, binary DV (or ordinal) • log( E(Y)/ (1-E(Y)) ) == log ( Pr (Y=1) / Pr (Y=0)) = b 0 + b 1 x • Log odds of success = linear function – Odds: 0 to inf., 1 is the middle – e.g.: odds = 5 = 5:1 … for five successes, one fail – Log odds: -inf to inf w/ 0 in the middle: good for regression • Modeled as binomial distribution 34

  35. Interpreting logistic regression • Take exp(coef) to get interpretable odds. • For each unit increase in x, odds increase b 1 times – Note that this can make small coefs important! • Use e.g., Homer-Lemeshow test for goodness of fit – null == data fit the model – But not a lot of power! 35

  36. MULTIVARIATE MU 36

  37. Multiple regression • Linear/logistic regression with more variables! – At least one numeric, 0+ categorical • Still: fixed x, normal errors w/ equal variance, independent errors (linear) • Linear relationship in E(Y) and one x, when other inputs held constant – Effects of each x are independent! • Still check q-n of residuals, residual vs. fit 37

  38. Model selection • Which covariates to keep? (more on this in a bit) 38

  39. Adding categorical vars • Indicator variables (everything is 0 or 1) • Need one fewer indicator than conditions – One condition is true; or none are true (baseline) – Coefs are *r *relative to o baseline*! *! • Model selection: keep all or none for one factor • Called “ANCOVA” when at least one each numeric + categorical 39

  40. Interaction • What if your covariates *aren’t* independent? • E(Y) = b0 + b 1 x 1 + b 2 x 2 + b 12 x 1 x 2 – Slope for x1 is diff. for each value of x2 • Superadditive: all in same direction, interaction makes effects stronger • Subadditive: interaction is in opposite direction • For indicator vars, all or none 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend