CS 102 Human – Computer Interaction
Lecture 17: Statistics for HCI Part III
CS 102 Human Computer Interaction Lecture 17: Statistics for HCI - - PowerPoint PPT Presentation
CS 102 Human Computer Interaction Lecture 17: Statistics for HCI Part III Course updates Idea log marks sent Attendance and overall performance grades to follow Guest lectures Ashish Goel, (Dec 3, Thursday) Recap: R Data
Lecture 17: Statistics for HCI Part III
Ashish Goel, (Dec 3, Thursday)
Vectors: x <- c(10.1, 6.2, 3.1, 6.0, 21.9) Matrices: y<-matrix(1:20, nrow=5,ncol=4) Dataframes: d <- c(1,2,3,4) e <- c("red", "white", "red", NA) mydata <- data.frame(d,e) Factors
CSV file: mydata <- read.csv(“chisq.csv", header = TRUE, row.names = 1)
Excel file: library(xlsx)
mydata <- read.xlsx("c:/myexcel.xlsx", 1)
summary (df) mean(data) median(data)
The difference observed is due to a sampling error
The difference observed is a “significant” difference, due to the independent variable
ue: How likely is the sample obtained, if if the null hypothesis holds true.
transaction decrease when a design element is modified?
after before
If the null hypothesis is true, then the mean of your sampling distribution (the curve) before modification should be equal to mean after modification So, regardless of design modification, the mean should be the same if the null hypothesis holds
Run a one-tailed t-test using the file “design.csv”
4.8 standard deviations away from the mean
Mean 1σ 2σ 3σ 4σ
t(14) = 4.8, p = 0.0001 The p-value is the probability
that is 4.8 standard deviations away from the mean, IF the null hypothesis is true Since the chance is very low (.01%), we reject the null
= 0.05 (Less than 5% chance)
Accept Alternative Hypothesis The time decreases after design modification.
Mean (After) Mean (Before)
Errors possible: Type I error: You wrongly rejected the null hypothesis Type 2 error: You wrongly accepted the null hypothesis!
Alternative Hypothesis Null Hypothesis Mean (After) Mean (Before) Type I error Type II error
Threshold
Group Type Quant antit itati ative ve Data (Normali mality ty assumed umed) Ordina nal l Data or Quant antita itativ tive (Normal mality ty not assumed umed) Nomina nal l Data
Two unpaired groups Unpaired t test Mann-Whitney test Fisher's test Two paired groups Paired t test Wilcoxon
McNemar' r's test More than two unmatched groups ANOVA Kruska kal-Wallis s test Chi-square test
When: Dependent variable is ordinal AND/OR normality cannot be assumed Compares: Medians of two independent groups Example 1: Do men and women rate a product’s functionality differently?
When: Dependent variable is ordinal AND/OR normality cannot be assumed Compares: Medians of two dependent groups (paired data) Example 1: Is there a difference in ratings for an interface before and after design modification?
Example 1: Is there a difference in ratings for an interface before and after design modification? (7- Likert Scale)
Before 4 3 2 4 2 5 4 1 6 2 After 7 5 6 6 4 4 7 6 5 5
What does your intuition say? Step 1: Calculate differences Step 2: Rank the differences Step 3: Sum up positive and negative ranks, choose lower W value
Step 1: Difference Calculation
Before 4 3 2 4 3 5 4 1 6 2 After 7 5 6 6 4 4 7 6 5 5 Sign
3 2 4 2 1 1 3 5 1 3
Step 2: Ranking the differences:
Before 4 3 2 4 3 5 4 1 6 2 After 7 5 6 6 4 4 7 6 5 5 Sign
3 2 4 2 1 1 3 5 1 3 Rank 7 4.5 9 4.5 2 2 7 10 2 7 Signed Rank
+2
+2
Step 3: Summing positive and negative ranks: Sum of positive ranks = 4 Sum of negative ranks = 51 Check that W+ + W- = n (n+1)/2 Choose the lower W value. Why?
Signed Rank
+2
+2
How powerful is your result? p-value is an indicator of significance Effect size indicates strength Measured by Cohen’s d or Pearson’s r
r = Z-value/sqrt(total samples) 2.4095/sqrt(20) = 0.538
The medians of Before ratings and After ratings were 3.5 and
5.5, respectively. A Wilcoxon Signed-rank test showed that there is a significant effect of design modification (W = 4, Z=- 2.4095, p = 0.0136, r = .538)
Example 2: Do people visit different pages on your site at different times of the day? (5- Likert Scale) What does your intuition say? Calculate the W value!
Time P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 Morning 2 4 1 1 2 3 2 1 3 4 Evening 3 3 2 5 1 4 4 3 5 3
Step 1: Difference Calculation
Morning 2 4 1 1 2 3 2 1 3 4 Evening 3 3 2 5 1 4 4 3 5 3 Sign
Abs(Diff) 1 1 1 4 1 1 2 2 2 1
Ranking the differences:
Mornin g 2 4 1 1 2 3 2 1 3 4 Evening 3 3 2 5 1 4 4 3 5 3 Sign
Abs(Dif f) 1 1 1 4 1 1 2 2 2 1 Rank 3.5 3.5 3.5 10 3.5 3.5 8 8 8 3.5 Signed Rank
+3.5 -3.5
+3.5
Summing positive and negative ranks: Sum of positive ranks = 10.5 Sum of negative ranks = 44.5 Check that W+ + W- = n (n+1)/2 W = 10.5
Signed Rank
+3.5
If you have the grades of students from 2 sections of a class, can you tell if one class is better than the other? What about extreme cases? What about interleaved cases? Download student.csv and studentinter.csv from course webpage
If you have the grades of students from 2 sections of a class, can you tell if one class is better than the other?
Student.csv: library(coin) wilcoxsign_test(Section1 ~ Section2, data = mydf, distribution="exact") W = 0, Z = -2.814, p-value = 0.001953 Studentinter.csv: wilcoxsign_test(inter$Section1 ~ inter$Section2, data = mydf, distribution="exact") W = 46, Z = -0.30779, p-value = 0.7871
When: Dependent variable is ordinal AND/OR normality cannot be assumed Compares: More than two independent groups (unpaired data) Example 1: Is there a difference in ratings for 3 versions of interface?
Is there a difference in ratings for 3 versions of interface? Rating <- c(1,2,5,3,2,1,1,3,2,1,4,3,6,5,2,6,1,6,5,4,9,6,7,7,5,1,8,9,6,5) Interface <- factor(c(rep(1,10),rep(2,10),rep(3,10))) data <- data.frame(Interface, Value) kruskal.test(Rating~Interface, data = data) p-value = 0.001073 Report: Significant effect of Interface was found(2(2)=13.7, p < 0.01).
What: Like Chi-square: nominal/categorical data When: small sample size (cell counts <10)
A/B Testing for 2 website versions (click-rate)
Compares: Means of two or more independent groups Assumptions: Independent samples
Do men and women differ in their preference for online surveys and personal interviews? f <- read.csv(“f.csv”) fisher.test(f) : p-value = 1 : no significant difference
PI PI Online ne Surveys ys Total Men 6 2 8 Women 8 4 12 Total 14 6 20
What: Paired chi-square When: Data is nominal/categorical AND paired Compares: Means of two or more dependent groups Example: Ease of use before and after interface change
Before re Change ge After r Change ge Easy 16 29 Difficul cult 14 1
data <- matrix(c(16,29 , 14, 1), ncol=2, byrow=T) mcnemar.test(data) p-value = 0.03276 The change in interface was significant
Before re Change ge After r Change ge Easy 16 29 Difficul cult 14 1
Group Type Quantita tati tive ve Data (Normality ty assumed) ed) Ordinal Data or Quantita tati tive ve (Normality ty not assumed) ed) Nominal Data
One or two unpaired samples (independent) Unpaired t-test (one- tail and two-tailed) Mann-Whitney test (U-values) Fisher's test (Chi- square for smaller samples) Paired samples (dependent) Paired t-test (one- tailed and two-tailed) Wilcoxon test (W-values) McNemar's test (paired categorical data) More than two unpaired samples ANOVA (not covered) Kruskal-Wallis test (eg: 3 interfaces) Chi-square test
http://yatani.jp/teaching/doku.php?id=hcistats:start
https://www.khanacademy.org/math/probability