Woefully Inadequate Intro to Stats for HCI Gri ffi n Dietz CS 197 - - PowerPoint PPT Presentation

woefully inadequate intro to stats for hci
SMART_READER_LITE
LIVE PREVIEW

Woefully Inadequate Intro to Stats for HCI Gri ffi n Dietz CS 197 - - PowerPoint PPT Presentation

Woefully Inadequate Intro to Stats for HCI Gri ffi n Dietz CS 197 HCI Section Adapted with permission from slides by Michael Bernstein and Tobi Gerstenberg But firstadministrivia Feedback == more guidance needed > ambiguity


slide-1
SLIDE 1

Woefully Inadequate Intro to Stats for HCI

Griffin Dietz CS 197 HCI Section

Adapted with permission from slides by Michael Bernstein and Tobi Gerstenberg

slide-2
SLIDE 2

But first…administrivia

Feedback == more guidance needed —> “ambiguity challenge” and making the best use of office hours/section Link to materials in project reports Evaluation assignment early release

slide-3
SLIDE 3

Null Hypothesis

If your change/intervention had no effect what would the world look like? This is called the null hypothesis. No slope in relationship No difference in means

slide-4
SLIDE 4

Null Hypothesis Significance Testing

Given the data you collected/difference you observed, how likely is it to have

  • ccurred by chance?

Probability of seeing a mean difference at least this large, by chance Probability of seeing a slope at least this large, by chance

slide-5
SLIDE 5

Enter, p-values

P-value is the probability of seeing the

  • bserved data by chance (or, the probability
  • f a Type I error)

Generally, p < .05 is accepted as “statistically significant” support for a condition difference

slide-6
SLIDE 6

Types of Data

Continuous (e.g., duration) Interval (e.g., exam scores) Ordinal (e.g., Likert scales) Binary (e.g., success/failure) Categorical (e.g., ethnicity) Type of data will change which statistical tests are appropriate.

slide-7
SLIDE 7

A non-ideal method

slide-8
SLIDE 8

A non-ideal method

slide-9
SLIDE 9

Pearson’s Chi-Square

For Comparing Two Population Counts (Binary Data)

slide-10
SLIDE 10

Calculate Chi-Square

“Five people completed the trial with the control interface, and twenty two completed it with the augmented interface.”

5 22 35 18 success failure control augmented

slide-11
SLIDE 11

Calculate Chi-Square

Determine the expected number of outcomes for each cell Expected is (row total)*(column total) / overall total.

Upper left: expected is 27*40/80 = 13.5

5 22 27 35 18 53 40 40 80 success failure control augmented total total

slide-12
SLIDE 12

Calculate Chi-Square

Expected values = (row total)*(column total) / overall total:

13.5 13.5 27 26.5 26.5 53 40 40 80 success failure control augmented total total

slide-13
SLIDE 13

Calculate Chi-Square

Calculate a chi square statistics for each cell and sum over all cells

13

χ2 = (observed − expected)2 expected

5.35 5.35 2.73 2.73

success failure control augmented

5.35 + 5.35 + 2.73 + 2.73 = 16.16

slide-14
SLIDE 14

Calculate Degrees of Freedom

If we know there are a total of 40 participants… We get (rows - 1) * (columns -1) degrees of freedom. So, if it’s a two-by-two design, one degree of freedom.

5 ??? ??? 18

slide-15
SLIDE 15

Result: Chi-Square Distribution

0.5 0.4 0.3 0.2 0.1 0.0 1 2 3 4 5 6

Probability chi-square statistic with one degree of freedom

Very likely Very unlikely =1.8

χ2

=16.16

χ2

slide-16
SLIDE 16

Pearson’s Chi-Square in R

chisq.test (HCI R tutorial at http://yatani.jp/HCIstats/ChiSquare)

slide-17
SLIDE 17

T-Test

For Comparing Two Population Means (Continuous, Normally Distributed Data)

slide-18
SLIDE 18

Normally Distributed Data

µ

σ

mean

  • std. dev.
slide-19
SLIDE 19

T-test: Do two samples have the same mean?

likely have different means likely have the same mean (null hypothesis)

µ1

µ1 µ2

µ2

slide-20
SLIDE 20

Calculate the t-statistic

t = µ1 − µ2 q

σ2

1

N1 + σ2

2

N2

Numbers that matter: Difference in means

larger means more significant

Variance in each group

larger means less significant

Number of samples

larger means more significant

slide-21
SLIDE 21

Calculate Degrees of Freedom

If we know the mean of N numbers, then only N-1 of those numbers can change. Example: pick three numbers with a mean of ten (e.g., 8, 10, 12). Once you’ve picked the first two, the third is set. We have two means, so a t-test has N-2 degrees of freedom.

slide-22
SLIDE 22

Result: t-distribution

0.4 0.3 0.2 0.1 0.0

  • 4
  • 2

2 4

Probability t statistic with 18 degrees of freedom

Very likely Very unlikely

t = .92

Very unlikely

slide-23
SLIDE 23

T-test in R

t.test (HCI R tutorial at http://yatani.jp/HCIstats/TTest)

slide-24
SLIDE 24

Paired t-test for within-subjects design

It can be easier to statistically detect a difference if the participants try both alternatives. Why?

A paired test controls for individual-level differences.

Is the mean of that difference significantly different from zero?

t = µ − 0 q

σ2 N

slide-25
SLIDE 25

Paired t-test in R

Why no longer significant? (Hint: look at the degrees of freedom “df”) Ten participants. If we had twenty participants like before, much more likely.

slide-26
SLIDE 26

ANOVA

For Comparing N>2 Population Means (Continuous, Normally Distributed Data)

slide-27
SLIDE 27

ANOVA: ANalysis Of VAriance

Use instead of a t-test when you have > 2 factor levels/ conditions and a continuous DV

Example: the effect of phone vs. tablet vs. laptop on number of searches successfully performed

Very nice property: an ANOVA is just a regression with one predictor under the hood!

slide-28
SLIDE 28

Linear Regression

For Comparing N>2 Population Means (Continuous, Normally Distributed Data)

slide-29
SLIDE 29

Linear Regression

Data = Model + Error

Model is a linear combination of predictors that minimizes error

Yi = β0 + β1Xi + ϵ0 Yi = β0 + β1Xi

slide-30
SLIDE 30

Is there a relationship between chocolate and happiness?

slide-31
SLIDE 31

Create a model with chocolate as a predictor

slide-32
SLIDE 32

Is the model a better fit

Or, does the model decrease error? Proportional Reduction in Error (PRE) = Model with chocolate as a predictor decreases error by about 54%.

1 − SSE(A) SSE(C) = 1 − 2396.946 5215.016 ≈ 0.54

slide-33
SLIDE 33

Compute an F statistic

PRE = Proportional reduction in error PA = number of parameters in Model C (PC) and Model A (PA) n = number of observations

F = PRE/(PA − PC) (1 − PRE)/(n − PA) = 0.54/(2 − 1) (1 − 0.54)/(10 − 2) = 9.4

slide-34
SLIDE 34

Result: F-distribution

0.9 0.6 0.3 0.0 2.5 5 7.5 10

Probability F statistic with eight degrees of freedom

Very likely Very unlikely

F = 9.4

slide-35
SLIDE 35

t.test (HCI R tutorial at http://yatani.jp/HCIstats/TTest)

Linear model in R

Overall model fit Impact of chocolate in model

When chocolate goes up one, happiness goes up .56 (p = .015)