cs 102
play

CS 102 Human Computer Interaction Lecture 17: Statistics for HCI - PowerPoint PPT Presentation

CS 102 Human Computer Interaction Lecture 17: Statistics for HCI Part III Course updates Idea log marks sent Attendance and overall performance grades to follow Guest lectures Ashish Goel, (Dec 3, Thursday) Recap: R Data


  1. CS 102 Human – Computer Interaction Lecture 17: Statistics for HCI Part III

  2. Course updates • Idea log marks sent • Attendance and overall performance grades to follow • Guest lectures  Ashish Goel, (Dec 3, Thursday)

  3. Recap: R • Data Types  Vectors: x <- c(10.1, 6.2, 3.1, 6.0, 21.9)  Matrices: y<-matrix(1:20, nrow=5,ncol=4)  Dataframes:  d <- c(1,2,3,4)  e <- c("red", "white", "red", NA)  mydata <- data.frame(d,e)  Factors

  4. Recap: R • Importing Data  CSV file: mydata <- read.csv(“chisq.csv", header = TRUE, row.names = 1)  Excel file: library(xlsx)  mydata <- read.xlsx("c:/myexcel.xlsx", 1) • Descriptive Statistics  summary (df)  mean(data)  median(data)

  5. Recap: Hypothesis Testing • H 0 : Null Hypothesis  The difference observed is due to a sampling error • H 1 : Alternative Hypothesis  The difference observed is a “ significant ” difference, due to the independent variable

  6. Recap: Hypothesis Testing • p-value ue: How likely is the sample obtained, if if the null hypothesis holds true. • A threshold of significance = 0.05 (typically) • Example : Does the time taken to complete a transaction decrease when a design element is modified?

  7. Recap: Hypothesis Testing If the null hypothesis is true, then the mean of your sampling distribution (the curve) before modification should be equal to mean after modification So, regardless of design modification, the mean should be the same if the null hypothesis holds    before after

  8. Recap: Hypothesis Testing Run a one-tailed t-test using the file “design.csv” t(14) = 4.8, p = 0.0001 The p-value is the probability of getting a sample like yours, that is 4.8 standard deviations 4.8 standard away from the mean, IF the deviations away from the null hypothesis is true mean Since the chance is very low (.01%), we reject the null hypothesis. Typical threshold = 0.05 (Less than 5% chance) Mean 1 σ 2 σ 3 σ 4 σ

  9. Recap: Hypothesis Testing Accept Alternative Hypothesis The time decreases after design Null Alternative Hypothesis Hypothesis modification. Mean (Before) Mean (After) Errors possible: Type I error: You wrongly rejected the null hypothesis Threshold Type 2 error: You wrongly Type II error Type I error accepted the null hypothesis! Mean (Before) Mean (After)

  10. Recap: Which Test When Group Type Quant antit itati ative ve Data Ordina nal l Data or Nomina nal l Data (Normali mality ty Quant antita itativ tive assumed umed) (Normal mality ty not assumed umed) Two unpaired groups Unpaired t test Mann-Whitney test Fisher's test Two paired groups Paired t test Wilcoxon on test McNemar' r's test More than two ANOVA Kruska kal-Wallis s test Chi-square test unmatched groups

  11. Rank Sum Tests • Mann Whitney’s U Test  When: Dependent variable is ordinal AND/OR normality cannot be assumed  Compares: Medians of two independent groups  Example 1 : Do men and women rate a product’s functionality differently?

  12. Rank Sum Tests • Wilcoxon Signed Rank Test  When: Dependent variable is ordinal AND/OR normality cannot be assumed  Compares: Medians of two dependent groups (paired data)  Example 1: Is there a difference in ratings for an interface before and after design modification?

  13. Wilcoxon Signed Rank Test  Example 1: Is there a difference in ratings for an interface before and after design modification? (7- Likert Scale) Before 4 3 2 4 2 5 4 1 6 2 After 7 5 6 6 4 4 7 6 5 5 What does your intuition say? Step 1: Calculate differences Step 2: Rank the differences Step 3: Sum up positive and negative ranks, choose lower W value

  14. Wilcoxon Signed Rank Test  Step 1: Difference Calculation Before 4 3 2 4 3 5 4 1 6 2 After 7 5 6 6 4 4 7 6 5 5 Sign - - - - - + - - + - Diff 3 2 4 2 1 1 3 5 1 3

  15. Wilcoxon Signed Rank Test  Step 2: Ranking the differences: Before 4 3 2 4 3 5 4 1 6 2 After 7 5 6 6 4 4 7 6 5 5 Sign - - - - - + - - + - Diff 3 2 4 2 1 1 3 5 1 3 7 4.5 9 4.5 2 2 7 10 2 7 Rank -7 -4.5 -9 -4.5 -2 +2 -7 -10 +2 -7 Signed Rank

  16. Wilcoxon Signed Rank Test  Step 3: Summing positive and negative ranks: -7 -4.5 -9 -4.5 -2 +2 -7 -10 +2 -7 Signed Rank  Sum of positive ranks = 4  Sum of negative ranks = 51  Check that W + + W - = n (n+1)/2  Choose the lower W value. Why?

  17. Wilcoxon Signed Rank Test in R • X <- c(4,3,2,4,3,5,4,1,6,2) • Y <- c(7,5,6,6,4,4,7,6,5,5) • wilcox.test(X, Y, paired=T)

  18. Wilcoxon Signed Rank Test in R • library(coin) • wilcoxsign_test(X ~ Y, distribution = "exact")

  19. Wilcoxon Signed Rank Test in R • Effect size  How powerful is your result?  p-value is an indicator of significance  Effect size indicates strength  Measured by Cohen’s d or Pearson’s r • Effect size for rank tests  r = Z-value/sqrt(total samples)  2.4095/sqrt(20) = 0.538

  20. Wilcoxon Signed Rank Test in R • How to report your results: The medians of Before ratings and After ratings were 3.5 and 5.5, respectively. A Wilcoxon Signed-rank test showed that there is a significant effect of design modification (W = 4, Z=- 2.4095, p = 0.0136, r = .538) • p-value < 0.05: H 0 rejected

  21. Wilcoxon Signed Rank Test  Example 2: Do people visit different pages on your site at different times of the day? (5- Likert Scale) Time P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 Morning 2 4 1 1 2 3 2 1 3 4 Evening 3 3 2 5 1 4 4 3 5 3  What does your intuition say?  Calculate the W value!

  22. Wilcoxon Signed Rank Test  Step 1: Difference Calculation Morning 2 4 1 1 2 3 2 1 3 4 Evening 3 3 2 5 1 4 4 3 5 3 Sign - + - - + - - - - + Abs(Diff) 1 1 1 4 1 1 2 2 2 1

  23. Wilcoxon Signed Rank Test  Ranking the differences: Mornin 2 4 1 1 2 3 2 1 3 4 g Evening 3 3 2 5 1 4 4 3 5 3 Sign - + - - + - - - - + Abs(Dif 1 1 1 4 1 1 2 2 2 1 f) Rank 3.5 3.5 3.5 10 3.5 3.5 8 8 8 3.5 Signed -3.5 +3.5 -3.5 -10 +3.5 -3.5 -8 -8 -8 +3.5 Rank

  24. Wilcoxon Signed Rank Test  Summing positive and negative ranks: Signed -3.5 +3.5 -3.5 -10 +3.5 -3.5 -8 -8 -8 +3.5 Rank  Sum of positive ranks = 10.5  Sum of negative ranks = 44.5  Check that W + + W - = n (n+1)/2  W = 10.5

  25. Wilcoxon Signed Rank Test in R • GroupA <- c(2,4,1,1,2,3,2,1,3,4) • GroupB <- c(3,3,2,5,1,4,4,3,5,3) • wilcox.test(GroupA, GroupB, paired=T) : not significant

  26. Question!  If you have the grades of students from 2 sections of a class, can you tell if one class is better than the other?  What about extreme cases?  What about interleaved cases?  Download student.csv and studentinter.csv from course webpage

  27. Question!  If you have the grades of students from 2 sections of a class, can you tell if one class is better than the other?  Student.csv: library(coin)  wilcoxsign_test(Section1 ~ Section2, data = mydf, distribution="exact")  W = 0, Z = -2.814, p-value = 0.001953  Studentinter.csv:  wilcoxsign_test(inter$Section1 ~ inter$Section2, data = mydf, distribution="exact")  W = 46, Z = -0.30779, p-value = 0.7871

  28. Rank Sum Tests • Kruskal Wallis  When: Dependent variable is ordinal AND/OR normality cannot be assumed  Compares: More than two independent groups (unpaired data)  Example 1: Is there a difference in ratings for 3 versions of interface?

  29. Kruskal Wallis Test  Is there a difference in ratings for 3 versions of interface?  Rating <- c(1,2,5,3,2,1,1,3,2,1,4,3,6,5,2,6,1,6,5,4,9,6,7,7,5,1,8,9,6,5)  Interface <- factor(c(rep(1,10),rep(2,10),rep(3,10)))  data <- data.frame(Interface, Value)  kruskal.test(Rating~Interface, data = data)  p-value = 0.001073  Report: Significant effect of Interface was found(  2 (2)=13.7, p < 0.01).

  30. Recap: Fisher’s Test  What: Like Chi-square: nominal/categorical data  When: small sample size (cell counts <10)  A/B Testing for 2 website versions (click-rate)  Compares: Means of two or more independent groups  Assumptions: Independent samples

  31. Fisher’s Test in R  Do men and women differ in their preference for online surveys and personal interviews? PI PI Online ne Surveys ys Total Men 6 2 8 Women 8 4 12 Total 14 6 20 f <- read.csv(“f.csv”) fisher.test(f) : p-value = 1 : no significant difference

  32. McNemar’sTest  What: Paired chi-square  When: Data is nominal/categorical AND paired  Compares: Means of two or more dependent groups  Example: Ease of use before and after interface change Before re Change ge After r Change ge Easy 16 29 Difficul cult 14 1

  33. McNemar’sTest in R  data <- matrix(c(16,29 , 14, 1), ncol=2, byrow=T)  mcnemar.test(data) Before re Change ge After r Change ge Easy 16 29 Difficul cult 14 1  p-value = 0.03276  The change in interface was significant

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend