Hypothesis testing, part 2 With some material from Howard Seltman, - PowerPoint PPT Presentation

Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1

CATEGOR ORICAL IV, NU NUMERI ERIC DV 2

Independent samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 T-test Mann-Whitney U, bootstrap 2+ One-way ANOVA Kruskal-Wallis, bootstrap 3

Is your data normal? • Skewness: asymmetry • Kurtosis: “peakedness” rel. to normal – Both: within +- 2SE(s/u) is OK • Or use Shapiro-Wilk (null = normal) • Or look at Q-Q plot 4

T-test • Already talked about • Assumptions: normality, equal variances, independent samples – Can use Levene to test equal variance assumption • Post-test: check residuals for assumption fit – For a t-test this is the same pre or post – For other tests you check residual vs. fit post 5

One way ANOVA • H0: m 1 = m 2 = m 3 • H1: at least one doesn’t match • NOT H1: m 1 != m 2 != m 3 • Assumptions: normality, common variance, independent errors • Intuition: F statistic – Variance between / Variance within – Under (exact null), F=1; F >> 1 rejects null 6

One-way ANOVA • F = MS b / MS w • MSw = sum [sum[ (diff from mean) 2 ]] / df w – df w = N-k, where k = number of conditions – Sum over all conditions; sum per condition • MS b = sum [(diff from grand mean) 2] / df b – df b = k-1 – Every observation goes in the sum 7

(example from Vibha Sazawal) 8

F-distribution rejected 10

Now what? (Contrasts) • So we rejected the null. What did we learn? – What *didn’t* we learn? – At least one is different ... Which? All? – This is called an “omnibus test” • To answer our actual research question, we usually need pa pairw rwise co contra trasts ts 11

The trouble with contrasts • Contrasts mess with your Type I bounds – One test: 95% confident – Three tests: 85.7% confident – 5 conditions, all pairs: 4 + 3 + 2 + 1 = 10 tests: 59.9% – UH OH 12

Planned vs. post hoc • Planned: You have a theory. – Really, no cheating – You get n-1 pairwise comparisons for free – In theory, should not be control vs. all, but prob. OK – NO COMPARISONS unless omnibus passes • Post-hoc – Anything unplanned – More than n-1 – Requires correction! – Doesn’t necessarily require omnibus first 13

Correction • Adjust {p-values, alpha} to compensate for multiple testing post-hoc • Bonferroni (most conservative) – Assume all possible pairs: m = k(k-1)/m (comb.) – alpha c = alpha / m – Once you have looked, implication is you did all the comparisons implicitly! • Holm-Bonferroni is less conservative – Stepwise adjusting alpha as you go • Dunnett for specifically all vs. control, others 14

Independent samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 T-test Mann-Whitney U, bootstrap 2+ One-way ANOVA Kruskal-Wallis, bootstrap 15

Non-parametrics: MWU and K-W • Good for non-normal data, likert data (ordinal, not actually numeric) • Assumptions: independent, at least ordinal • Null: P(X > Y) = P(Y > X) where X,Y are observations from the 2 distributions (MWU) – If assume same distribution shape, continuous then this can can be seen as comparing medians 16

MWU and K-W continued • Essentially: rank order all data (both conditions) – Total ranks for condition 1, compare to “expected” – Various procecures to correct for ties 17

Bootstrap • Resampling technique(s) • Intuition: – Create “null” distribution by e.g. subtracting means so mA = mB = 0 • Now you have shifted samples A-hat and B-hat – Combine these to make a null distribution – Draw sample of size N, with replacement • Do it 1000 (or 10k) times – Use this to determine critical value (alpha = 0.05) – Compare this critical value to your real data for test 18

Paired samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 Paired T-test Wilcoxon signed-rank 2+ 2-way ANOVA w/ Friedman subject random factor Mixed models (later) 19

Paired T-test • Two samples per participant item • Test subtracts them • Then uses one-sample T-test with H0: m = 0 and H1: m != 0 • Regular T-test assumptions, plus: does subtraction make sense here? 20

Wilcoxon S.R. / Friedman • H0: difference btwn pairs is symmetric around 0 • H1: … or not • Excludes no-change items • Essentially: rank by abs. difference; compare signs * ranks • (Friedman = 3+ generalization) 21

One numeric IV, numeric DV SIMPLE LINEAR REGRESSION ON 22

Simple linear regression • E(Y|x) = b 0 + b 1 x … looks at populations – Population mean at this value of x • Key H0: b 1 != 0 – b 0 usually not important for significance (obv. important in model fit) • b 1 : slope à change in Y per unit X • Best fit: Least squares, or maximum likelihood – LSq: minimize sum of squares of residuals – ML: max prob. of seeing this data with this model 23

Assumptions, caveats • Assumes: – linearity in Y ~ X – normally distributed error for each x, with constant variance at all x – Error measuring X is small compared to var. Y (fixed X) • Independent errors! – Serial correlation, data that is grouped, etc. (later) • Don’t interpret widely outside available x vals • Can transform for linearity! – Log(Y), sqrt(y), 1/y, y^2 24

Assumption/residual checking • Before: Use scatterplot for plausible linearity • After: residual vs. fit – Residual on Y vs. predicted on X – Should be relatively even distributed around 0 (linear) – Should have relatively even v. spread (eq. var) • After: quantile-normal of residuals 25

Model interpretation • Interpret b1, interpret the p-value • CI: if it crosses 0, it’s not significant • R 2 : fraction of total variation accounted for – Intutively: explained variance / total variance – Explained = var(Y) – residual errors • F 2 = R 2 / (1 – R R 2 ); SML: 0.02, 0.15, 0.35 (cohen) 26

Robustness • Brittle to linearity, independent errors • Somewhat brittle to fixed-X • Fairly robust to equal variance • Quite robust to normality 27

CATEGOR ORICAL OU OUTCOM OMES 28

One Cat. IV, Cat. DV, independent • Contingency tables: how many people in each combination of categories 29

Chi-square test of independence • H0: distribution of Var1 is the same at every level of Var2 (and vice versa) – Null dist. Approaches X^2 when sample size grows – Heuristic: no cells < 5 – Can use FET instead • Intuition: – Sum over rows/columns: (observed – expected)^2 / expected – Expected: marginal % * count in other margin 30

Paired 2x2 tables • Use McNemar’s test – Contigency table: matches and mismatches for each option. • H0: marginals are the same Cond1: Yes Cond 1: No Cond2: Yes a b a + b Cond2: No c d c + d a + c b + d N • Essentially a X^2 test on the agreement – Test stat: (b-c)^2 / (b+c) 31

Paired, continued • Cochran’s Q: extended for more than two conditions • Other similar extensions for related tasks 32

Critiques • Choose a paper that has one (or more) empirical experiments as a central contribution – Doesn’t have to be human subjects, but can be – Does have to have enough description of experiment • 10-12 minute presentation • Briefly: research questions, necessary background • Main: describe and critique methods – Experimental design, data collection, analysis – Good, bad, ugly, missing • Briefly, results? 33

Logistic regression (logit) • Numeric IV, binary DV (or ordinal) • log( E(Y)/ (1-E(Y)) ) == log ( Pr (Y=1) / Pr (Y=0)) = b 0 + b 1 x • Log odds of success = linear function – Odds: 0 to inf., 1 is the middle – e.g.: odds = 5 = 5:1 … for five successes, one fail – Log odds: -inf to inf w/ 0 in the middle: good for regression • Modeled as binomial distribution 34

Interpreting logistic regression • Take exp(coef) to get interpretable odds. • For each unit increase in x, odds increase b 1 times – Note that this can make small coefs important! • Use e.g., Homer-Lemeshow test for goodness of fit – null == data fit the model – But not a lot of power! 35

MULTIVARIATE MU 36

Multiple regression • Linear/logistic regression with more variables! – At least one numeric, 0+ categorical • Still: fixed x, normal errors w/ equal variance, independent errors (linear) • Linear relationship in E(Y) and one x, when other inputs held constant – Effects of each x are independent! • Still check q-n of residuals, residual vs. fit 37

Model selection • Which covariates to keep? (more on this in a bit) 38

Adding categorical vars • Indicator variables (everything is 0 or 1) • Need one fewer indicator than conditions – One condition is true; or none are true (baseline) – Coefs are *r *relative to o baseline*! *! • Model selection: keep all or none for one factor • Called “ANCOVA” when at least one each numeric + categorical 39

Interaction • What if your covariates *aren’t* independent? • E(Y) = b0 + b 1 x 1 + b 2 x 2 + b 12 x 1 x 2 – Slope for x1 is diff. for each value of x2 • Superadditive: all in same direction, interaction makes effects stronger • Subadditive: interaction is in opposite direction • For indicator vars, all or none 40

Hypothesis testing, part 2 With some material from Howard Seltman, - PowerPoint PPT Presentation

Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGOR ORICAL IV, NU NUMERI ERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312, Spring 2019 Heckman

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Testing 6.1 Specification testing Michel Bierlaire A short reminder on hypothesis testing

Hypothesis testing get data that differ from the null hypothesis. If the data would be quite

Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69 Steps of

Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2019

Comparing Nested Models Two regression models are called nested if one contains all the predictors

ANOVA, Single + Multiple Factors, Lending Club data Kaelen Medeiros Product Data Scientist at

MA162: Finite mathematics . Jack Schmidt University of Kentucky October 24, 2012 Schedule: HW

Section 2.3: Amounts for Periodic Payments MATH 105: Contemporary Mathematics University of

Single Factor Analysis of Variance (ANOVA) Bernd Schr oder logo1 Bernd Schr oder

Checking model assumptions with regression diagnostics Graeme L. Hickey University of

CHAPTER 11 ANALYSIS OF VARIANCE ONE-WAY ANALYSIS OF VARIANCE ANOVA is a procedure used to

QstatLab: software for statistical process control and robust engineering I.N.Vuchkov Iniversity