CS 102 Human Computer Interaction Lecture 17: Statistics for HCI - - PowerPoint PPT Presentation

cs 102
SMART_READER_LITE
LIVE PREVIEW

CS 102 Human Computer Interaction Lecture 17: Statistics for HCI - - PowerPoint PPT Presentation

CS 102 Human Computer Interaction Lecture 17: Statistics for HCI Part III Course updates Idea log marks sent Attendance and overall performance grades to follow Guest lectures Ashish Goel, (Dec 3, Thursday) Recap: R Data


slide-1
SLIDE 1

CS 102 Human – Computer Interaction

Lecture 17: Statistics for HCI Part III

slide-2
SLIDE 2

Course updates

  • Idea log marks sent
  • Attendance and overall performance grades to follow
  • Guest lectures

 Ashish Goel, (Dec 3, Thursday)

slide-3
SLIDE 3

Recap: R

  • Data Types

 Vectors: x <- c(10.1, 6.2, 3.1, 6.0, 21.9)  Matrices: y<-matrix(1:20, nrow=5,ncol=4)  Dataframes:  d <- c(1,2,3,4)  e <- c("red", "white", "red", NA)  mydata <- data.frame(d,e)  Factors

slide-4
SLIDE 4

Recap: R

  • Importing Data

 CSV file: mydata <- read.csv(“chisq.csv", header = TRUE, row.names = 1)

 Excel file: library(xlsx)

 mydata <- read.xlsx("c:/myexcel.xlsx", 1)

  • Descriptive Statistics

 summary (df)  mean(data)  median(data)

slide-5
SLIDE 5

Recap: Hypothesis Testing

  • H0 : Null Hypothesis

 The difference observed is due to a sampling error

  • H1: Alternative Hypothesis

 The difference observed is a “significant” difference, due to the independent variable

slide-6
SLIDE 6

Recap: Hypothesis Testing

  • p-value

ue: How likely is the sample obtained, if if the null hypothesis holds true.

  • A threshold of significance = 0.05 (typically)
  • Example: Does the time taken to complete a

transaction decrease when a design element is modified?

slide-7
SLIDE 7

Recap: Hypothesis Testing

after before

  

If the null hypothesis is true, then the mean of your sampling distribution (the curve) before modification should be equal to mean after modification So, regardless of design modification, the mean should be the same if the null hypothesis holds

slide-8
SLIDE 8

Recap: Hypothesis Testing

Run a one-tailed t-test using the file “design.csv”

4.8 standard deviations away from the mean

Mean 1σ 2σ 3σ 4σ

t(14) = 4.8, p = 0.0001 The p-value is the probability

  • f getting a sample like yours,

that is 4.8 standard deviations away from the mean, IF the null hypothesis is true Since the chance is very low (.01%), we reject the null

  • hypothesis. Typical threshold

= 0.05 (Less than 5% chance)

slide-9
SLIDE 9

Recap: Hypothesis Testing

Accept Alternative Hypothesis The time decreases after design modification.

Mean (After) Mean (Before)

Errors possible: Type I error: You wrongly rejected the null hypothesis Type 2 error: You wrongly accepted the null hypothesis!

Alternative Hypothesis Null Hypothesis Mean (After) Mean (Before) Type I error Type II error

Threshold

slide-10
SLIDE 10

Recap: Which Test When

Group Type Quant antit itati ative ve Data (Normali mality ty assumed umed) Ordina nal l Data or Quant antita itativ tive (Normal mality ty not assumed umed) Nomina nal l Data

Two unpaired groups Unpaired t test Mann-Whitney test Fisher's test Two paired groups Paired t test Wilcoxon

  • n test

McNemar' r's test More than two unmatched groups ANOVA Kruska kal-Wallis s test Chi-square test

slide-11
SLIDE 11

Rank Sum Tests

  • Mann Whitney’s U Test

 When: Dependent variable is ordinal AND/OR normality cannot be assumed  Compares: Medians of two independent groups  Example 1: Do men and women rate a product’s functionality differently?

slide-12
SLIDE 12

Rank Sum Tests

  • Wilcoxon Signed Rank Test

 When: Dependent variable is ordinal AND/OR normality cannot be assumed  Compares: Medians of two dependent groups (paired data)  Example 1: Is there a difference in ratings for an interface before and after design modification?

slide-13
SLIDE 13

Wilcoxon Signed Rank Test

 Example 1: Is there a difference in ratings for an interface before and after design modification? (7- Likert Scale)

Before 4 3 2 4 2 5 4 1 6 2 After 7 5 6 6 4 4 7 6 5 5

What does your intuition say? Step 1: Calculate differences Step 2: Rank the differences Step 3: Sum up positive and negative ranks, choose lower W value

slide-14
SLIDE 14

Wilcoxon Signed Rank Test

 Step 1: Difference Calculation

Before 4 3 2 4 3 5 4 1 6 2 After 7 5 6 6 4 4 7 6 5 5 Sign

  • +
  • +
  • Diff

3 2 4 2 1 1 3 5 1 3

slide-15
SLIDE 15

Wilcoxon Signed Rank Test

 Step 2: Ranking the differences:

Before 4 3 2 4 3 5 4 1 6 2 After 7 5 6 6 4 4 7 6 5 5 Sign

  • +
  • +
  • Diff

3 2 4 2 1 1 3 5 1 3 Rank 7 4.5 9 4.5 2 2 7 10 2 7 Signed Rank

  • 7
  • 4.5
  • 9
  • 4.5 -2

+2

  • 7
  • 10

+2

  • 7
slide-16
SLIDE 16

Wilcoxon Signed Rank Test

 Step 3: Summing positive and negative ranks:  Sum of positive ranks = 4  Sum of negative ranks = 51  Check that W+ + W- = n (n+1)/2  Choose the lower W value. Why?

Signed Rank

  • 7
  • 4.5 -9
  • 4.5 -2

+2

  • 7
  • 10

+2

  • 7
slide-17
SLIDE 17

Wilcoxon Signed Rank Test in R

  • X <- c(4,3,2,4,3,5,4,1,6,2)
  • Y <- c(7,5,6,6,4,4,7,6,5,5)
  • wilcox.test(X, Y, paired=T)
slide-18
SLIDE 18

Wilcoxon Signed Rank Test in R

  • library(coin)
  • wilcoxsign_test(X ~ Y, distribution = "exact")
slide-19
SLIDE 19

Wilcoxon Signed Rank Test in R

  • Effect size

 How powerful is your result?  p-value is an indicator of significance  Effect size indicates strength  Measured by Cohen’s d or Pearson’s r

  • Effect size for rank tests

 r = Z-value/sqrt(total samples)  2.4095/sqrt(20) = 0.538

slide-20
SLIDE 20

Wilcoxon Signed Rank Test in R

  • How to report your results:

The medians of Before ratings and After ratings were 3.5 and

5.5, respectively. A Wilcoxon Signed-rank test showed that there is a significant effect of design modification (W = 4, Z=- 2.4095, p = 0.0136, r = .538)

  • p-value < 0.05: H0 rejected
slide-21
SLIDE 21

Wilcoxon Signed Rank Test

 Example 2: Do people visit different pages on your site at different times of the day? (5- Likert Scale)  What does your intuition say?  Calculate the W value!

Time P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 Morning 2 4 1 1 2 3 2 1 3 4 Evening 3 3 2 5 1 4 4 3 5 3

slide-22
SLIDE 22

Wilcoxon Signed Rank Test

 Step 1: Difference Calculation

Morning 2 4 1 1 2 3 2 1 3 4 Evening 3 3 2 5 1 4 4 3 5 3 Sign

  • +
  • +
  • +

Abs(Diff) 1 1 1 4 1 1 2 2 2 1

slide-23
SLIDE 23

Wilcoxon Signed Rank Test

 Ranking the differences:

Mornin g 2 4 1 1 2 3 2 1 3 4 Evening 3 3 2 5 1 4 4 3 5 3 Sign

  • +
  • +
  • +

Abs(Dif f) 1 1 1 4 1 1 2 2 2 1 Rank 3.5 3.5 3.5 10 3.5 3.5 8 8 8 3.5 Signed Rank

  • 3.5 +3.5 -3.5
  • 10

+3.5 -3.5

  • 8 -8
  • 8

+3.5

slide-24
SLIDE 24

Wilcoxon Signed Rank Test

 Summing positive and negative ranks:  Sum of positive ranks = 10.5  Sum of negative ranks = 44.5  Check that W+ + W- = n (n+1)/2  W = 10.5

Signed Rank

  • 3.5 +3.5 -3.5 -10 +3.5 -3.5
  • 8 -8
  • 8

+3.5

slide-25
SLIDE 25

Wilcoxon Signed Rank Test in R

  • GroupA <- c(2,4,1,1,2,3,2,1,3,4)
  • GroupB <- c(3,3,2,5,1,4,4,3,5,3)
  • wilcox.test(GroupA, GroupB, paired=T) : not significant
slide-26
SLIDE 26

Question!

 If you have the grades of students from 2 sections of a class, can you tell if one class is better than the other?  What about extreme cases?  What about interleaved cases?  Download student.csv and studentinter.csv from course webpage

slide-27
SLIDE 27

Question!

 If you have the grades of students from 2 sections of a class, can you tell if one class is better than the other?

 Student.csv: library(coin)  wilcoxsign_test(Section1 ~ Section2, data = mydf, distribution="exact")  W = 0, Z = -2.814, p-value = 0.001953  Studentinter.csv:  wilcoxsign_test(inter$Section1 ~ inter$Section2, data = mydf, distribution="exact")  W = 46, Z = -0.30779, p-value = 0.7871

slide-28
SLIDE 28

Rank Sum Tests

  • Kruskal Wallis

 When: Dependent variable is ordinal AND/OR normality cannot be assumed  Compares: More than two independent groups (unpaired data)  Example 1: Is there a difference in ratings for 3 versions of interface?

slide-29
SLIDE 29

Kruskal Wallis Test

 Is there a difference in ratings for 3 versions of interface?  Rating <- c(1,2,5,3,2,1,1,3,2,1,4,3,6,5,2,6,1,6,5,4,9,6,7,7,5,1,8,9,6,5)  Interface <- factor(c(rep(1,10),rep(2,10),rep(3,10)))  data <- data.frame(Interface, Value)  kruskal.test(Rating~Interface, data = data)  p-value = 0.001073  Report: Significant effect of Interface was found(2(2)=13.7, p < 0.01).

slide-30
SLIDE 30

Recap: Fisher’s Test

 What: Like Chi-square: nominal/categorical data  When: small sample size (cell counts <10)

 A/B Testing for 2 website versions (click-rate)

 Compares: Means of two or more independent groups  Assumptions: Independent samples

slide-31
SLIDE 31

Fisher’s Test in R

 Do men and women differ in their preference for online surveys and personal interviews? f <- read.csv(“f.csv”) fisher.test(f) : p-value = 1 : no significant difference

PI PI Online ne Surveys ys Total Men 6 2 8 Women 8 4 12 Total 14 6 20

slide-32
SLIDE 32

McNemar’sTest

 What: Paired chi-square  When: Data is nominal/categorical AND paired  Compares: Means of two or more dependent groups  Example: Ease of use before and after interface change

Before re Change ge After r Change ge Easy 16 29 Difficul cult 14 1

slide-33
SLIDE 33

McNemar’sTest in R

 data <- matrix(c(16,29 , 14, 1), ncol=2, byrow=T)  mcnemar.test(data)  p-value = 0.03276  The change in interface was significant

Before re Change ge After r Change ge Easy 16 29 Difficul cult 14 1

slide-34
SLIDE 34

Cheat Sheet: Which Test When

Group Type Quantita tati tive ve Data (Normality ty assumed) ed) Ordinal Data or Quantita tati tive ve (Normality ty not assumed) ed) Nominal Data

One or two unpaired samples (independent) Unpaired t-test (one- tail and two-tailed) Mann-Whitney test (U-values) Fisher's test (Chi- square for smaller samples) Paired samples (dependent) Paired t-test (one- tailed and two-tailed) Wilcoxon test (W-values) McNemar's test (paired categorical data) More than two unpaired samples ANOVA (not covered) Kruskal-Wallis test (eg: 3 interfaces) Chi-square test

slide-35
SLIDE 35

Resources for Statistics

  • Statistics in HCI

http://yatani.jp/teaching/doku.php?id=hcistats:start

  • Biostatistics Handbook

http://www.biostathandbook.com/index.html

  • Statistics for Dummies

https://www.khanacademy.org/math/probability