Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use - - PowerPoint PPT Presentation

chapter 6 hypothesis testing what is hypothesis testing
SMART_READER_LITE
LIVE PREVIEW

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use - - PowerPoint PPT Presentation

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical procedures to answer research questions Typical research question (generic): For hypothesis testing, research questions are statements: This is


slide-1
SLIDE 1

Chapter 6 Hypothesis Testing

slide-2
SLIDE 2

What is Hypothesis Testing?

  • … the use of statistical procedures to answer research

questions

  • Typical research question (generic):
  • For hypothesis testing, research questions are statements:
  • This is the null hypothesis (assumption of “no difference”)
  • Statistical procedures seek to reject or accept the null

hypothesis (details to follow)

2

slide-3
SLIDE 3
  • Thus far:

– You have generated a hypothesis (E.g. The mean of group A is different than the mean of group B) – You have collected some data (samples in group A, samples in group B) – Now you want to know if this data supports your hypothesis – Formally: – H0 (null hypothesis): there is no difference in the mean values of group A and group B – H1 (experimental hypothesis): there is a difference in the mean of group A and group B

3

slide-4
SLIDE 4

A practitioner’s point of view

  • Test statistic

– Inferential statistics tell us what is the likelihood that the experimental hypothesis is true à by computing a test statistic. – Typically, if the likelihood of obtaining a value of a test statistic is <0.05, then we can reject the null hypothesis – “…significant effect of …”

  • Non-significant results

– Does not mean that the null hypothesis is true – Interpreted to mean that the results you are getting could be a chance finding

  • Significant result

– Means that the null hypothesis is highly unlikely

4

slide-5
SLIDE 5
  • Errors:

– Type 1 error (False positive) : we believe that there is an effect when there isn’t one – Type 2 error (False negative) : we believe that there isn’t an effect, when there is one – If p<0.05, then the probability of a Type 1 error is < 5% (alpha level)

  • Typically, we deal with two types of hypotheses

– The mean of group A is different from the mean of group B (one-tailed test) – The mean of group A is larger than the mean of group B (two-tailed test)

5

A practitioner’s point of view

slide-6
SLIDE 6

Statistical Procedures

  • Two types:

– Parametric

  • Data are assumed to come from a distribution, such as the

normal distribution, t-distribution, etc.

– Non-parametric

  • Data are not assumed to come from a distribution

– Lots of debate on assumptions testing and what to do if assumptions are not met (avoided here, for the most part) – A reasonable basis for deciding on the most appropriate test is to match the type of test with the measurement scale of the data (next slide)

6

slide-7
SLIDE 7

Measurement Scales vs. Statistical Tests

  • Parametric tests most appropriate for…

– Ratio data, interval data

  • Non-parametric tests most appropriate for…

– Ordinal data, nominal data (although limited use for ratio and interval data)

7

M=Male, F=Female Preference ranking Likert scale responses Task completion time Examples

slide-8
SLIDE 8

Tests Presented Here

  • Parametric

– T-test – Analysis of variance (ANOVA) – Most common statistical procedures in HCI research

8

slide-9
SLIDE 9

T-test

  • Goal: To ascertain if the difference in the means of two groups is significant
  • Assumptions

– Data are normally distributed (you checked for this by looking at the histograms, reporting the mean/median/standard deviation, and by running Shapiro-Wilks) – If data come from different groups of people à Independent t-test (assumes scores are independent and variances in the populations are roughly equal … check your table of descriptive statistics) – If data come from same group of people à dependent t-test

  • Practioner’s point of view: When in doubt, consult a book! Let’s do an

example in R

9

slide-10
SLIDE 10

Tests Presented Here

  • Parametric

– Analysis of variance (ANOVA)

  • Used for ratio data and interval data
  • Most common statistical procedure in HCI research
  • Non-parametric

– Chi-square test

  • Used for nominal data

– Mann-Whitney U, Wilcoxon Signed-Rank, Kruskal- Wallis, and Friedman tests

  • Used for ordinal data

10

slide-11
SLIDE 11

Analysis of Variance

  • The analysis of variance (ANOVA) is the most

widely used statistical test for hypothesis testing in factorial experiments

  • Goal à determine if an independent variable has a

significant effect on a dependent variable

  • Remember, an independent variable has at least

two levels (test conditions)

  • Goal (put another way) à determine if the test

conditions yield different outcomes on the dependent variable (e.g., one of the test conditions is faster/slower than the other)

11

slide-12
SLIDE 12

Why Analyse the Variance?

  • Seems odd that we analyse the variance, but the

research question is concerned with the overall means:

  • Let’s explain through two simple examples (next

slide)

12

slide-13
SLIDE 13

13

Example #1 Example #2

“Significant” implies that in all likelihood the difference observed is due to the test conditions (Method A vs. Method B). “Not significant” implies that the difference observed is likely due to chance. File: 06-AnovaDemo.xlsx

slide-14
SLIDE 14

Example #1 - Details

14

Error bars show ±1 standard deviation Note: SD is the square root of the variance Note: Within-subjects design

slide-15
SLIDE 15

Example #1 – ANOVA1

1 ANOVA table created by StatView (now marketed as JMP, a product of SAS; www.sas.com)

Probability of obtaining the observed data if the null hypothesis is true

Reported as… F1,9 = 9.80, p < .05

Thresholds for “p”

  • .05
  • .01
  • .005
  • .001
  • .0005
  • .0001
slide-16
SLIDE 16

How to Report an F-statistic

  • Notice in the parentheses

– Uppercase for F – Lowercase for p – Italics for F and p – Space both sides of equal sign – Space after comma – Space on both sides of less-than sign – Degrees of freedom are subscript, plain, smaller font – Three significant figures for F statistic – No zero before the decimal point in the p statistic (except in Europe)

slide-17
SLIDE 17

Example #2 - Details

Error bars show ±1 standard deviation

slide-18
SLIDE 18

Example #2 – ANOVA

Reported as… F1,9 = 0.626, ns

Probability of obtaining the observed data if the null hypothesis is true Note: For non-significant effects, use “ns” if F < 1.0,

  • r “p > .05” if F > 1.0.
slide-19
SLIDE 19

Example #2 - Reporting

19

slide-20
SLIDE 20

More Than Two Test Conditions

20

slide-21
SLIDE 21

ANOVA

  • There was a significant effect of Test Condition on the

dependent variable (F3,45 = 4.95, p < .005)

  • Degrees of freedom

– If n is the number of test conditions and m is the number of participants, the degrees of freedom are… – Effect à (n – 1) – Residual à (n – 1)(m – 1) – Note: single-factor, within-subjects design

21

slide-22
SLIDE 22

Post Hoc Comparisons Tests

  • A significant F-test means that at least one of the test

conditions differed significantly from one other test condition

  • Does not indicate which test conditions differed

significantly from one another

  • To determine which pairs differ significantly, a post hoc

comparisons tests is used

  • Examples:

– Fisher PLSD, Bonferroni/Dunn, Dunnett, Tukey/Kramer, Games/ Howell, Student-Newman-Keuls, orthogonal contrasts, Scheffé

22